Closed nixel2007 closed 4 years ago
The builds on CI servers failed
Hi, @danielsun1106
I see an error in build logs:
testCharSetRange(org.antlr.v4.test.tool.TestLexerExec) Time elapsed: 0.078 sec <<< FAILURE!
org.junit.ComparisonFailure: expected:<...0]
[@1,4:5='34',<1>,[1:4]
[@2,7:8='a2',<2>,1:7]
[@3,10:12='abc',<2>,1:10]
[@4,18:17='<EOF>',<-1>,2]:3]
> but was:<...0]
[@1,4:5='34',<1>,[2:1]
[@2,7:8='a2',<2>,2:4]
[@3,10:12='abc',<2>,2:7]
[@4,18:17='<EOF>',<-1>,3]:3]
>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at org.antlr.v4.test.tool.TestLexerExec.testCharSetRange(TestLexerExec.java:497)
But I don't understand what exactly numbers after the second comma mean. Could you guide me about this a little?
The first number is a token type, i suggest. And the second and the third ones?
@nixel2007 I can not remember its meaning either... @sharwell should be able to tell us ;-)
[@2,7:8='a2',<2>,2:4]
Item | Meaning |
---|---|
@2 |
Token index (this in the third token) |
7:8 |
Token span (character position) in the original input |
a2 |
Token text |
<2> |
Token type |
2:4 |
Starting line:column for the token |
Note that I'm not planning to take this change. It should be possible to derive from the existing types in the library if you need to customize the token positioning mechanism. (It would have been better if tokens didn't track this information at all, but it's too late to remove now.)
a-ha, test string contains standalone CR, so the second 34
token is placed to the 2nd line. so, new numbers is correct.
It should be possible to derive from the existing types in the library if you need to customize the token positioning mechanism.
could you point me to this classes/methods? I haven't digged it a lot, but I suppose that if I'll try to change token position somewhere else than LexerATNSimulator#consume
, I will have to store line/character offsets and change them for every token rather than one-time change in consume
. Not good for RAM neither CPU.
@nixel2007 LexerATNSimulator.consume
is a virtual method. Create your own type derived from that, and override consume
to handle this case.
Create your own type derived from that, and override consume to handle this case.
Did as you said, thank you. One small question - is there any option to override default constructor from g4 file? The only way I've found for this is a constructor with another signature:
You can generate your grammar as abstract
if you wish to provide a different constructor in code. I forget the exact command line syntax but it's handled by this code:
Ok, i'll try it. thank you!
@nixel2007 did you end up successfully overriding the LexerATNSimulator
constructor by generating your grammar as abstract
to solve this issue? Do you have a working example that you can point me to?
@nmancus1 yep. I've created CRAwareLexerATNSimulator (https://github.com/1c-syntax/bsl-parser/blob/master/src/main/java/com/github/_1c_syntax/bsl/parser/CRAwareLexerATNSimulator.java) and pass it to lexer in members
block - https://github.com/1c-syntax/bsl-parser/blob/855158d4e7948326f76cf32b33ebd16c580896be/src/main/antlr/BSLLexer.g4#L29-L35. no need to make grammar abstract.
Hello!
I've made changes to treat stand-alone CR as line separator. This changes were tested on millions of LOC with-out any noticable perfomance regress. Also I use my fork to parse files everyday as a part of static analysis process.
P.S. Link to my PR to original ANTLR4 repo opened half an year ago...