kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.57k stars 231 forks source link

Fixing Nullable start rule TODO in lib/nearley.js? #613

Open lassehp opened 2 years ago

lassehp commented 2 years ago

Hi, I am new to nearley, having used yacc (byacc and bison) with C in the past, and written my own LL(1) generator for table driven parsers for use with nodejs. I have been looking around in the nearley code for a place to put in a common action for all rules, so that a grammar with no explicitly added actions will generate a useful concrete syntax tree, instead of what it appears to be doing now, namely the same, just without including the non-terminal names in the nodes.

While doing this, I stumbled on a TODO comment in lib/nearley.js, line 271 or so: // TODO what if start rule is nullable?

I admit I have no idea of the context in which this comment occurs, and I'm not sure if there is some aspect of it I just failed to understand, if so I apologize. But reading it, and assuming it means what I think it means, I'd naively think that the answer is trivial: If the start rule is nullable, then obviously only the empty input string will match this condition, and any nullable production for the start symbol could be in the parse tree - or maybe it could just be the easy way out, with an empty array. If it is somehow still a problem that the start rule S is nullable, then a trivial solution would be to hack the grammar, adding an EndOfInput symbol and a new start rule S' -> S EndOfInput - which then of course also has to be appended to the input - I think that's a very common trick with LL parsers also?