umut-sahin / dotlr

An LR(1) parser generator and visualizer created for educational purposes.
Apache License 2.0
91 stars 4 forks source link

Alternative syntax for rules that have more than one production #9

Open Specy opened 2 months ago

Specy commented 2 months ago

Currently the syntax for rules that have more than one production is to specify it more than once:

P -> E

E -> E '+' T
E -> T

T -> %id '(' E ')'
T -> %id

%id -> /[A-Za-z][A-Za-z0-9]+/

Usually i've seen it written like this:

//inline
E -> E '+' T | T

//or multiline, although this might be harder if the rules are separated by a \n character
T -> %id '(' E ')'
   | %id

It makes it quicker and more clear for rules that have many productions, and all 3 syntaxes can be implemented to give more choice (the grammar for a single rule can include one or more production, as to keep the current syntax valid)

And whenever https://github.com/umut-sahin/dotlr/issues/2 gets implemented (this is a really important one!), the epsilon symbol could be a special symbol eps or similar, so it would allow to write things like this:

P -> E | eps

I'd advise against

P ->

unless the new line character is used to separate the different rules (which would be a problem if the multiline rule definition is implemented)

umut-sahin commented 2 months ago

I think empty match can just be:

P -> ''

As for multiple productions, it's just syntactic sugar. It would be nice to have it for sure, but it's not a critical issue. I'd prefer to focus on some other issues for the time being. But if you want to work on it, I'd be more than happy to review!

The only difference will be in the parsing of the grammar. Grammar struct doesn't even need to change. It should be doable by just changing https://github.com/umut-sahin/dotlr/blob/f71a5e42a670bbe52264dd735873f3eaa1e6c76a/src/grammar.rs#L263

Note that the grammar parser was a handwritten one, and it might be a bit awkward to add this feature. Feel free to change the entire grammar parsing module with something else if you want!

Specy commented 2 months ago

ok yes P -> '' is more simple and intuitive. I agree that it's not a critical issue but surely a nice "to have", i'll keep this issue open for the future. Priority should definitely go to eps productions

umut-sahin commented 2 months ago

Thanks for the idea and understanding, let's get back to this at some point :+1: