Closed RevanthRameshkumar closed 1 year ago
probably the easiest solution is to use regex lookahead to enforce this.
Ok, so something like this?
%ignore WS_INLINE
%import common.WS_INLINE
start: exists_decl
variable: /z\d+/
EXISTS: /exists(?=\s)/
exists_decl:EXISTS variable
That seems to do the job for me since existsz2 fails but exists z2 succeeds. And to clarify, this works because the lexer will handle all regexes before applying the ignores? I didn't know that was the case.
It's simply that the lexer reads tokens one by one, whether ignored or not. If it can match "exists" and then "z", it doesn't care if there is an ignored token in between or not.
Funny enough, the basic lexer behaves the way you'd expect, requiring space between keywords and names. The contextual lexer (the default) is too smart for its own good, it knows that "existsz2" isn't possible, so it parses them as two tokens. (but switching to the basic lexer isn't recommended)
Gotcha. In that case, is there a reason to use a lookahead vs just something like:
EXISTS: "exists"i " "+
is it that the lookahead is more concise stylistically?
This isn't about style, it's about practice. If you put " "+
it means that the lookahead is going to see whitespace, instead of the token afterwards. (since our LALR implementation only has a lookahead of 1). That would seriously hinder the parser analysis.
Thanks, that makes sense! I actually just got a v1 of my grammar totally working now. Thanks for your help :)
What is your question?
How do I generally ignore whitespace but enforce it around keywords only? If I use the ignore whitespace directive then both of the examples below will parse
Int x = 1 ;
Vsintx = 1 ;
But I only want the first to parse
If you're having trouble with your code or grammar
Provide a small script that encapsulates your issue.
Explain what you're trying to do, and what is obstructing your progress.