Open ColonelThirtyTwo opened 3 years ago
The recommended solution is to capture comments in the newline.
Something like:
_NL: (NEWLINE | COMMENT)+
If you have a better way, we'll be happy to hear it.
@erezsh Thanks for the quick response. Looks like that works.
Might be worth putting that in the docs somewhere - seems like a gotcha. I think that the Indenter could be altered to coalesce adjacent _NL tokens too, so that the alteration isn't needed.
I think you're right, it might be possible to do so. It would prevent languages in which a blank space is a dedent, but perhaps those aren't very common.
Anyway, in terms of performance, the current solution works best.
We can keep this issue open, while I consider the best option. If the code remains the same, I agree we should mention this somewhere in the docs.
@erezsh It might in general be worth considering creating a FAQs/best practices/common misunderstanding page.
@MegaIng What would you put there? It doesn't seem like there are a lot of repetitive issues, since most of them are solved in the code.
Not necessarily repetitive (right now), but tricks that can't really be fixed and should be documented somewhere. Of course, we can just keep them in issues, but I think adding a page in the docs for them is worth it. #857 (which is actually a 'duplicate' of #517), #841, this issue, #838, #833, etc. (+ stuff from gitter, which is even less searchable than github issues). Most of them were just answered with a short text, explaining what is going on and how to fix the grammar. These could all just be formulated into a "FAQ" page. If you don't want this it's fine, but I think it is worth it.
@MegaIng I have no objection. I you want to write such a page I'll add it.
Maybe some of those can fit in https://lark-parser.readthedocs.io/en/latest/how_to_use.html or https://lark-parser.readthedocs.io/en/latest/recipes.html
@erezsh
The recommended solution is to capture comments in the newline.
Something like:
_NL: (NEWLINE | COMMENT)+
That doesn't seem to work in my case where the comments are SH_COMMENTS because the comment eats the \n before NEWLINE sees it.
Any suggestions for a grammar that
@julie777 That's what the official Python grammar does.. https://github.com/lark-parser/lark/blob/master/lark/grammars/python.lark
Describe the bug
Indenter is cited as the way to parse whitespace-sensitive languages, but it has unintiutive and obtrusive behavior when there is an ignored token (ex a comment) in the middle of a newline sequence.
To Reproduce
Example:
Remove the
/* */
from thedata
variable and it parses ok.Happens because the ignored comment splits up the _NL tokens, and the indenter does not coalesce them.