Strengthen the follow-set rule for macros

traviscross commented 2 weeks ago

Over in:

https://github.com/rust-lang/rust/pull/130635

@compiler-errors describes this general problem:

The breakage specifically represents an inherent limitation to the "macro follow-set" formulation which is supposed to make us more resilient against breakages due to extensions to the grammar like this.

Given two macro matcher arms:

($ty:ty) => ...

(($tt:tt)*) => ...

And given tokens like:

& pin mut [...more tokens may follow...]

On nightly today, &pin gets parsed as a type. However, we run out of matchers but still have tokens left (the mut token is next), so we fall through to the next arm. Since it's written like ($tt:tt)*, everything is allowed, and we match the second arm successfully...

I think that's weird, because if this second arm were written like $ty:ty mut, that would be illegal, since mut is not in the follow-set of the :ty matcher. Thus, we can use :tt matchers to observe whether the compiler actually parses things not in our grammar that should otherwise be protected against, which seems pretty gross.

And @Noratrieb proposes a general solution:

I believe a solution to this would be the following new logic:

after the end of a macro matcher arm has been reached

and there are still input tokens remaining

and if the last part of the matcher is a metavar

ensure that the first remaining token is in the follow set of this metavar

if it is, move on to the next arm

if it is not, emit an error

What this semantically does is strengthen the "commit to fully matching metavars or error" behavior such that it extends past the end. I don't know how many macros rely on this, but it seems like emitting an FCW (instead of error) on such macro invocations would find all these cases and ensure that the follow-set logic is actually robust past the end. But imo this shouldn't block this PR (which should probably just ship as-is) and can be done separately.

This issue is to track the proposal for this FCW.

cc @Noratrieb @compiler-errors @eholk @rust-lang/lang

compiler-errors commented 2 weeks ago

I think that this FCW could be implemented, but also I think it's a bit of a shame we can't detect this at the macro definition site like we can for a follow-set error.

That would require having to like... construct like NDFAs for each matcher arm and compare them to detect when $tt:tt conflicts in the way I mentioned above, so I don't think it's possible or at least remotely implementable.

traviscross commented 2 weeks ago

In the meeting today, we didn't quite have time to discuss this, but @nikomatsakis noted:

I don't think this proposal is sufficient but I am interested in pursuing a real fix to this for a future edition.

Example:
macro_rules! test {
    (if $x:ty { }) => {};
    (if $x:expr { }) => {};
}
This basically says to pick one arm if something is a type, another if it's an expression. Extending the type grammar to cover new cases could change which arm you go down to.

I think the most general fix is to say: when you would start parsing a fragment, first skip ahead to find the extent of it (i.e., until you see an entry from the follow-set). Then parse it as the fragment. If the parsing fails or there are unconsumed tokens, report a hard error.

I suspect it would break a lot in practice and we would need an opt-in.

rust-lang / rust

Strengthen the follow-set rule for macros #131025