lukaslueg / macro_railroad

A library to generate syntax diagrams for Rust macros.
MIT License
527 stars 11 forks source link

tt-munchers are not recognized #7

Open sunjay opened 5 years ago

sunjay commented 5 years ago

Hello! Thanks for creating such a cool project! :smile: :tada:

tt-munchers are one of many macro programming patterns written about in The Little Book of Macros. Many of Rust's more complicated and powerful macros use this pattern to accomplish their functionality. These are probably the macros that would benefit the most from this project, so it would be really cool if there was a way to integrate them properly.

Basically, what we would need to do is "see through" that pattern and modify the diagram so that it reflects the way the macro is actually meant to be used instead of just the raw input syntax that it takes. This is definitely a non-trivial thing to detect in all cases, so I really just recommend starting from a few common patterns and working your way up.

Let's look at this example:

    // Given `ids!(0, a, b, d)`, this will produce:
    // let a = 0;
    // let b = 1;
    // let c = 2;
    macro_rules! ids {
        // Increment a given counter
        ($count:expr, $name:ident, $($rest:tt)*) => {
            ids!($count, $name);
            ids!($count + 1, $($rest)*);
        };
        // Support only one name and also make the ending comma optional
        ($count:expr, $name:ident) => {
            let $name = $count;
        };
        // Base case
        ($count:expr) => ();
    }

Running this in macro_railroad (correctly) produces the following diagram:

image

This exactly matches the input syntax of the macro as it is written. It would be nice however, if the diagram represented how the macro is truly meant to be used:

image

One potentially simple way to implement this would be to:

  1. Detect "tt repetition" at the end of the declared input to a macro branch
    • This is anything in the form (..., $($xxx:tt)*) => {...}; in macro_rules
    • Note that $($xxx:tt)* is different from $($xxx:tt),* -- the second does not fall under this pattern
  2. Detect whether that tt repetition is passed back into the macro itself recursively
    • You might need to check if the tt repetition is passed back exactly the same with no additional tokens added within the repetition (so $($xxx == 1)* would be different)
    • If this happens, you can look at the macro call and figure out after which token the tt repetition is placed. Once you know that, you draw a line back to where the tokens from the tt repetition will be passed.
    • In the example above, the macro calls itself in the line ids!($count + 1, $($rest)*); -- you would draw a line back to just after the comma after count
      • The second image above shows the exact desired result
    • If the tt repetition is used anywhere else other than in a call to the same macro, do not treat it specially -- This implies that there may be multiple calls that a tt repetition is passed to, so you may draw arrows leaving one token and getting to multiple other tokens
    • Ideally you would support more complicated forms than just $rest:tt so any repetitions that get passed back into the macro would work, but maybe that can be a future enhancement

Some cases to think about:

Hopefully I haven't scared you away with all of this information! I've tried to outline a lot of the cases for you to consider, but I'm sure I missed some as well. Given all of this, I really do recommend starting by supporting the most simple cases of this pattern and then working up to the more general case.

If this is implemented, even just for simpler cases, it would be extremely valuable as a documentation tool for so many macros.

lukaslueg commented 5 years ago

AFAIKS the point here is to detect that while $rest is a :tt, it always ends up as an :ident. We would need at least a limited form of built-in macro-expansion mechanism to detect this in a syntax tree. As you pointed out, there are a lot of cases where the tt-muncher-idiom can't be detected, e.g. if there is control flow within the macro-expansion.