rust-lang / compiler-team

A home for compiler team planning documents, meeting minutes, and other such things.
https://rust-lang.github.io/compiler-team/
Apache License 2.0
383 stars 67 forks source link

Remove `Nonterminal` and `TokenKind::Interpolated` #763

Open nnethercote opened 2 weeks ago

nnethercote commented 2 weeks ago

Proposal

Declarative macro expansion involves the parsing of token sequences into AST nodes, which are then pasted back into the token stream as TokenKind::Interpolated tokens. Each such token contains a Nonterminal, an enum that can contain an AST expr, or stmt, or item, or block, etc.

This MCP proposes to instead convert the AST node back into tokens and insert those tokens into the token stream, with invisible (a.k.a. "none") delimiters around the token sequence to protect precedence.

One reason for the change is that it's really weird to have AST pieces interpolated through a token stream, because tokens and AST nodes are two different levels. A bit like having a sequence of words and punctuation in natural language, but some of the "words" are themselves phrases or sentences or paragraphs. When I first encountered Interpolated tokens it took me some time to understand them.

Another reason is that it makes the implementation of declarative macros and proc macros more similar. Currently if you pass an Interpolated token to a proc macro, the proc macro bridge converts it into a Group that is delimited by invisible delimiters. Then if that gets returned from the proc macro the invisible delimiters remain. (Proc macros can also create invisible-delimited sequences from scratch.) In other words, proc macros work entirely with token streams, so it will be nice for declarative macros (and the parser) to do the same.

Also, currently the parser just ignores all invisible delimiters! This leads to occasional precedence issues like https://github.com/rust-lang/rust/issues/67062. Fixing this bug was one of my original motivations with this work. It turns out more complicated than I originally expected, and this MCP won't be enough to fix the bug, though it's definitely a step in the right direction because it will result in us having a single mechanism for grouping tokens instead of two, and the parser will no longer eliminate all invisible delimiters.

This change will also completely eliminate the "forwarding a match fragment" limitation of declarative macros. (Update: this is closer to being eliminated, but is still necessary.)

There will be some minor perf effects. Having to tokenize and reparse AST fragments has non-zero cost. Most rustc-perf benchmarks aren't affected. deep-vector is the big exception. It's an artificial stress test containing a single vec! call with 100,000+ zeroes in it, which is a pathological case for this change. Currently the biggest regression is 90% for an incr-unchanged check build, but there's an easy change that will reduce that to 25-30%. hyper and libc also see some moderate regressions, up to 7% in the worst case. I think these regressions can be reduced some more, but probably not fully eliminated. Some other benchmarks see slight improvements of up to 1.5%, probably because TokenKind can now be made Copy, and tokens get copied around a lot.

Mentors or Reviewers

@petrochenkov will review, and has helped a lot along the way.

https://github.com/rust-lang/rust/pull/124141 has a draft implementation, which is very closely to completely working. This is my third attempt at this change in three years (https://github.com/rust-lang/rust/pull/96724 and https://github.com/rust-lang/rust/pull/114647 were my previous attempts) and I'm confident it will succeed this time. Other than the "forwarding a match fragment" limitation being lifted, there shouldn't be any user-visible changes.

Process

The main points of the Major Change Process are as follows:

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

rustbot commented 2 weeks ago

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

Concerns or objections to the proposal should be discussed on Zulip and formally registered here by adding a comment with the following syntax:

 @rustbot concern reason-for-concern 
 <description of the concern> 

Concerns can be lifted with:

 @rustbot resolve reason-for-concern 

See documentation at https://forge.rust-lang.org

cc @rust-lang/compiler @rust-lang/compiler-contributors

petrochenkov commented 1 week ago

@rustbot second