Open ghost opened 7 years ago
This is a new one for me!
:question: Do you have the following property in your project file?
<PropertyGroup>
<Antlr4UseCSharpGenerator>True</Antlr4UseCSharpGenerator>
</PropertyGroup>
:question: Do all of your grammar files have different names?
❓ If possible, can you attach a portion of the build output containing the errors with a few lines of context?
Seems to me this is a grammar warning with surprisingly trimmed ends. See CHARACTERS_COLLISION_IN_SET in grammars repo.
@psllc can you upload a full or fragment part of your lexers? I'm wondering about SET
rule.
Seems to me this is a grammar warning with surprisingly trimmed ends.
Nice, probably a mistake in the regular expression used to separate messages from paths in the build tooling.
i did not have Antlr4UseCSharpGenerator in the project; i added this and it did not help.
Yes, all my grammars have unique names. this all worked before moving to VS 2017
here is the build log:
i don';t know which of the six grammars are causing the problem... here are the lexer sections from all of them:
//
// EXPRESSION.g4 Lexer Rules
//
TRUE : ('T'|'t')('R'|'r')('U'|'u')('E'|'e');
FALSE : ('F'|'f')('A'|'a')('L'|'l')('S'|'s')('E'|'e');
fragment SQSTRING : ( '\'' ( ~'\'' | '"' | '\'\'' )* '\'' );
fragment DQSTRING : ( '"' ( ~'"' | '""' )* '"' ) ;
STRING : SQSTRING | DQSTRING ;
GT : '>';
LT : '<';
QUESTION : '?';
COLON : ':';
EQUAL : '=';
LE : '<=';
GE : '>=';
NOTEQUAL : '<>';
AND : 'and' ;
OR : 'or';
XOR : 'xor' ;
ADD : '+';
SUB : '-';
MUL : '*';
DIV : '/';
STRCONTAT : '&';
EXPONENT : '^';
MOD : 'mod';
NOT : 'not';
LBRAGE : '{' ;
RBRACE : '}' ;
FIELD_REFERENCE : '[' ~('[' | ']' )* ']' ;
FUNCNAME : ('a'..'z'|'A'..'Z'|'_')+ ;
LPAREN : '(' ;
RPAREN : ')' ;
COMMA : ',' ;
fragment FLOAT : INTEGER '.' INTEGER ;
fragment INTEGER : DIGIT DIGIT* ;
fragment DIGIT : '0'..'9' ;
DATE : DIGIT DIGIT DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT 'T' ;
NUMBER : FLOAT | INTEGER ;
WS : ('\t' | '\r' | '\n' |'ï'|'»'|'¿'|' '|'\t'|'\r'|'\n')+ -> channel(HIDDEN) ;
//
//FEIDLREFERENCE.g4 lexer rules
//
ALPHA_CHAR : 'a'..'z'|'A'..'Z' ;
SPEC_CHAR : '-' | '_' ;
IDENTIFIER : ALPHA_CHAR (INTEGER | ALPHA_CHAR | SPEC_CHAR )* ;
LBRACKET : '[' ;
RBRACKET : ']' ;
LPAREN : '(' ;
RPAREN : ')' ;
INTEGER : DIGIT DIGIT* ;
DIGIT : '0'..'9' ;
DOT : '.';
SEMI : ';';
WS : ('\t' | '\r' | '\n' |'ï'|'»'|'¿'|' '|'\t'|'\r'|'\n')+ -> channel(HIDDEN) ;
/*
SEQUENCE.g4 Lexer Rules
*/
TEXT
: ~(','|';'|'\n'|'\r'|'"'|'{'|'}')+
;
fragment SQSTRING : ( '\'' ( ~'\'' | '"' | '\'\'' )* '\'' );
fragment DQSTRING : ( '"' ( ~'"' | '""' )* '"' ) ;
STRING : SQSTRING | DQSTRING ;
LBRACE: WS* '{' WS* ;
RBRACE : WS* '}' WS* ;
COMMA: WS* ',' WS* ;
WS : ('\t' | '\r' | '\n' |'ï'|'»'|'¿'|' '|'\t'|'\r'|'\n')+ -> channel(HIDDEN) ;
//
// VALUESET.g4 lexer rules
//
TEXT : ~('|'|'\n'|'\r')+ ;
VBAR: '|';
//
// VALUESETITEM.g4 lexer rules
//
INT: WS* ( '0'..'9' )+ WS* ;
fragment COMMA: WS* ',' WS* ;
TEXT : COMMA ~('\r'|'\n')+ ;
WS: ' ' -> skip ;
this all worked before moving to VS 2017
I imagine what you are experiencing is a side effect of updating to a new version of the code generator, not a side effect of updating to 2017. Either way, we obviously need to get it fixed. I can think of two possible problems:
\
character where it's not directly supported, it will now report an error (or maybe a warning).Thank you for your time and help with this. I've removed the 'funky' characters from the whitespace lex rule and i've saved all the grammars as UTF-8 with no resolution. I've done a careful scan for misuse of the backslash but if it's there i'm missing it...
ok, so i looked again and i'm so embarrassed by what i just found... this rule:
WS : ('\t' | '\r' | '\n' |'ï'|'»'|'¿'|' '|'\t'|'\r'|'\n')+ -> channel(HIDDEN) ;
has the tab, return, and newline characters declared TWICE. that was causing the error...
i can't believe it took me that long to see this... granted, the error could have been a bit more helpful and perhaps there is a problem in what it's pulling or showing in the msg text, but still... i should have seen this earlier.
Thanks for your time Sam. Sorry to have wasted it. ;)
The error text is quite helpful: "chars \"<arg>\" used multiple times in set <arg2>"
. There is a bug in error text displaying.
The grammar is more strict about escape sequences than it used to be.
Strict grammar is better than weak in my opinion :)
i agree the stricter grammar is better... but rather than the displayed error of:
"Unknown build error: "' used multiple times in set: SET" "Unknown build error: "' used multiple times in set: SET" "Unknown build error: "' used multiple times in set: SET"
it (should?) have shown:
"Unknown build error: "\t" used multiple times in set: SET" "Unknown build error: "\r" used multiple times in set: SET" "Unknown build error: "\n" used multiple times in set: SET"
It should have shown the file and line number along with an error number too. 😄
"Unknown build error: "' used multiple times in set: SET"
At least these warnings with escaped chars displayed correctly in the last ANTLR version: https://github.com/antlr/antlr4/blob/master/tool-testsuite/test/org/antlr/v4/test/tool/TestSymbolIssues.java#L399
Unfortunately, warning location cannot be displayed correctly for now: https://github.com/antlr/antlr4/blob/4.7/tool/src/org/antlr/v4/automata/ATNOptimizer.java#L106