lukaslueg / macro_railroad

A library to generate syntax diagrams for Rust macros.
MIT License
536 stars 11 forks source link

Multicharacter tokens not parsed correctly #4

Open lukaslueg opened 6 years ago

lukaslueg commented 6 years ago
macro_rules! x {
    (= >) => {
        println!("Space");
    };
    (=>) => {
        println!("No space");
    };
}

fn main() {
    x!(= >);
    x!(=>);
}

The two branches are currently seen as identical, even before optimizing. This is not correct.

lukaslueg commented 6 years ago

The problem here is that we parse (or lower) Punct incorrectly:

The two arms parse as (roughly)

Ok(MacroRules { name: Ident(x), rules: [Rule { matcher: [Punct(Punct { op: '=', spacing: Alone }), Punct(Punct { op: '>', spacing: Alone })], expansion: TokenStream [Ident { sym: println }, Punct { op: '!', spacing: Alone }, Group { delimiter: Parenthesis, stream: TokenStream [Literal { lit: "Space" }] }, Punct { op: ';', spacing: Alone }] }] })

Ok(MacroRules { name: Ident(x), rules: [Rule { matcher: [Punct(Punct { op: '=', spacing: Joint }), Punct(Punct { op: '>', spacing: Alone })], expansion: TokenStream [Ident { sym: println }, Punct { op: '!', spacing: Alone }, Group { delimiter: Parenthesis, stream: TokenStream [Literal { lit: "No space" }] }, Punct { op: ';', spacing: Alone }] }] })

Is case of = > the two Punctare Alone. In case of => the = is Joint, so it's combined with the >.

dtolnay commented 6 years ago

Macro_rules' concept of tokens is confusing and it's not just a matter of looking at whether the proc macro token is Joint or Alone:

macro_rules! x {
    // These rules are always equivalent.
    (=> >) => { println!("Space"); };
    (=>>) => { println!("No space"); };
}

fn main() {
    x!(=> >); // "Space"
    x!(=>>); // "Space"
}
macro_rules! x {
    // These rules are *not* equivalent.
    (= >>) => { println!("Space"); };
    (=>>) => { println!("No space"); };
}

fn main() {
    x!(=> >); // "No space"
    x!(=>>); // "No space"
}

They greedily left-to-right form groups of consecutive punctuation according to which multi-character punctuations are recognized by Rust's grammar, and then whitespace between groups is ignored. (This is a limitation that is fixed in the token API of procedural macros.) So for example =>> and => > are equivalent because they both group as => >, while = >> and =>> are not equivalent because =>> is grouped as => > which is different from = >>.