Closed jhorstmann closed 4 years ago
Hmm.. without EOF
in grammar it works. So the problem is in the EOF
handling, not sure yet where it is exactly, though.
I think omitting the EOF
just makes the parser stop at the first thing it does not recognize without reporting an error and with None
for the where clause.
But I just found another strange thing, if I change the grammar and query to use shorter keywords everything gets parsed correctly:
SELECT : S E L;
WHERE : W H;
AS : A S;
FROM : F R O M;
LIMIT : L I M;
In fact I can simplify the example to just
query : SELECT EOF;
SELECT : S E L E C T;
SPACES : [ \t\r\n] -> skip ;
and when parsing "select" it prints:
line 1:0 token recognition error at: 'selec'
line 1:5 token recognition error at: 't'
line 1:6 missing SELECT at '<EOF>'
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/main.rs:20:25
I pushed the simplified example to my repo.
Thanks, that indeed helped. Forgot to calculate initial hash in one place. I will fix it in 0.1.1 soon. I have published antlr-rust on cargo recently, so you can start using it as normal dependency, if you want.
Sorry for the late reply, version 0.1.1 fixed the issue and works very nicely. Thanks a lot!
I'm using the following pattern to parse keywords in a case-insensitive way:
The full grammar and test case can be found at https://github.com/jhorstmann/rust-antlr-case-insensitive-keywords. The lexer and parser are generated by
build.rs
and the project can be run withcargo run
to output the parsed as of a sample query.I see the following output which shows an error message from the parser:
It's interesting that other keywords like
AS
orFROM
seem to be parsed fine. A similar grammar in a java project also handles all keywords like this in a case insensitive way.My guess would be that it's somehow related to the precedence of the lexer rules between, where the keywords should take precedence because they are listed before the
IDENTIFIER
rule in the grammar.