Closed elektito closed 5 years ago
I assume you're using LALR.
LALR by default uses the contextual lexer, which depends on the state of the parser to tokenize. Changing the order means it tries to determine a terminal before the parser advanced to the next state.
If you have to do it this way, for whatever reason, you can try to use lexer="standard"
. It will revert to the traditional YACC/PLY lexer, which doesn't care about the parser state. However, that means that you're losing a bit of parsing power, and might experience more collisions.
Thanks. Yes, switching to standard parser does seem to fix this issue, although I have no idea whether it might degrade parsing power later, for me. The real reason I'm doing this relates to how the END statement and the likes of "END IF" have conflicts. I posted a question about the grammar here on Stack Overflow and apparently this is not something that can be easily and cleanly fixed in an LALR parser.
The plan was to detect the standalone END command and convert it to something else in the post-lexer, this I find cleaner than converting "END IF" to "ENDIF" and so on. In order to do that however, I have to look ahead.
Also, Earley could be a solution here, but Lark's implementation seems to resolve ambiguities in a non-deterministic manner, something that I am mortally afraid of, when it comes to programming, so I decided to switch to LALR and all hell broke loose!
BTW, all that said, this is a really cool library and the best I've found so far, so thank you!
There is a reply there from by sepp2k which describes the problem correctly, and also offers the right solution. I propose that you try it, and if it works (as it should), accept is as an answer.
so thank you!
You're welcome :)
I'll copy the solution part of his answer:
You can fix this, somewhat hackishly, by turning end if into a single token like this:
ENDIF_KW: /end[ \t\f]+if/i
And then using ENDIF_KW instead of END_KW IF_KW.
Yes. I guess that's what I'm going to do then. Thanks.
I want to have a post-lexer that's essentially looks like this (with more bits of code inserted in between!)?
The way this code is, the parser doesn't work correctly. I get "no terminal defined for..." errors where I do not if I swap the
tok2 = next(stream)
line with theyield tok
line.