mike-lischke / antlr4ng

Next Generation TypeScript runtime for ANTLR4
Other
65 stars 11 forks source link

New line characters treated wrong in Lexer #70

Open lonely-lockley opened 1 month ago

lonely-lockley commented 1 month ago

Hi! Unfortunately, I didn't save the exact output to help you reproduce the bug, but briefly: in the grammar with predicates checking char position in line (currentTokenColumn property of Lexer) calculated wrong. In grammar like this:

fragment Nl           : ('\r'?'\n' | '\n')  ;
fragment Ws           : (' ' | '\t' | '\u000C') ;

EOL              : { this.currentTokenColumn > 0 }? Nl -> channel(HIDDEN) ;
EMPTY_LINE       : { this.currentTokenColumn == 0 }? Nl -> skip ;
BLANK            : { this.currentTokenColumn > 0 }? Ws+ -> channel(HIDDEN) ;
INDENTATION      : { this.currentTokenColumn == 0 }? Ws+ -> channel(HIDDEN) ;

NEWLINE_INDENT          : EOL BLANK* INDENTATION ;

and input like '\n\n ' the BLANK and INDENTATION rules never trigger because currentTokenColumn is calculated as if all those tokens are on the same line. I expected that each EOL would increase line number and reset currentTokenColumn to zero. In Java ANTLR4 implementation it works this way.

antlr4ng version 3.0.4 antlr4ng-cli version 2.0.0