Open yangfl opened 3 years ago
How is this going on ? I'd love to see this being implemented. One reference for example, I find that ohm-js has explained how it autoskips white spaces https://ohmjs.org/docs/syntax-reference#syntactic-lexical . Also lark-parser https://lark-parser.readthedocs.io/en/latest/grammar.html#ignore
Agreed. I want to test parsing the text "eat food on plates" with the following toy NLP grammar:
S -> V NP | VP PP NP -> N PP VP -> V N PP -> P N V -> "eat" N -> "food" | "plates" P -> "on"
But when I run nearley-test GrammarTest1.js -i "eat food on plates"
nearley chokes on the whitespace. I would have to adjust my grammar to include whitespace tokens but that defeats the whole point since I want to test the grammar above, not some new grammar with whitespace tokens. I want to see the Earley items created with the grammar above and the input string given.
Maintainer:@kach @tjvr
Since in 99% cases you are really not caring about the whitespace tokens, they are only used for splitting tokens. It's safe to skip whitespace tokens when using a lexer since the lexer already correctly split the tokens for you. For example, to parse arithmetic expressions, one may use
instead of
, which is much messy and hard to maintain.
I'd suggest adding some code similar to
around
https://github.com/kach/nearley/blob/98e4d21ef9c7836700c0503c10bb0d6465a3c26a/lib/nearley.js#L324-L338
I would be intersted in wirting this feature myself.