Allowing empty string in lexer islands?

kaby76 / Antlr4BuildTasks

Third-party build tool for 'Official' Antlr4 tool and runtime parsers using .Net. Drop-in replacement for 'Antlr4cs' Antlr4 tool and build rules.

MIT License

77 stars 11 forks source link

Allowing empty string in lexer islands? #83

Open znakeeye opened 6 months ago

znakeeye commented 6 months ago

Using Antlr4.Runtime.Standard 4.13.1 and Antlr4BuildTasks 12.8.

Please consider the grammar below. For input [] I expect tokens START VALUE("") END.

lexer grammar TagLexer;

START   : '[' -> pushMode(VALUE_MODE);

mode VALUE_MODE;

VALUE   : ~']'*;           // May be empty, obviously.
END     : ']' -> popMode;

It will produce an ANT01 warning which is then treated as an error:

Warning ANT01 warning(146): non-fragment lexer rule VALUE can match the empty string Error ANT02 error(10): warning treated as error

My understanding is that this warning should be ignored in this case. Under no circumstances will it cause an infinite loop in the lexer. Please see https://github.com/antlr/antlr4/issues/180.

How can I get my parser to compile?

kaby76 commented 6 months ago

You are right.. The warning should not be treated as an error. I will check what is wrong.

kaby76 commented 6 months ago

Does your .csproj have a <Error>true</Error> within the <Antlr4> element?

znakeeye commented 6 months ago

Yes, indeed.

kaby76 commented 6 months ago

Antlr4BuildTasks is a thin layer on top of the Java Antlr4 Tool. The package creates a process for java -jar antlr-version-complete.jar .... When <Error>true</Error> is set, the Antlr tool is called with the -Werr option, java -jar antlr-version-complete.jar -Werr .... If the call to the Antlr tool is treating the warning as error, the tool will not create any files. The package will then need to catch the error, and reissue the command without -Werr but only if that is the only warning. Seems to be a lot of work to override what the Antlr tool does. I suggest deleting <Error>true</Error> for the lexer grammar.

znakeeye commented 6 months ago

Thanks for clarification. I'll keep an ey on that <Error> setting 👍

As for empty tokens, I realized that the lexer should never handle those. It's the task for the parser!