tunnelvisionlabs / antlr4cs

The original, highly-optimized C# Target for ANTLR 4
Other
448 stars 103 forks source link

Unknown build error: "' used multiple times in set: SET #251

Open ghost opened 7 years ago

ghost commented 7 years ago

Using Visual Studio 2017 with "Antlr4" version="4.6.5-beta001". I have a project with six grammar files in it. i'm seeing three instances of the error:

"Unknown build error: "' used multiple times in set: SET"

no source file, or line number is provided so i don't know which of the six files is responsible.

the Visual Studio build will complete despite these errors, but unfortunately the TFS CI build aborts on these errors. If i enable 'continue' on the CI build i will be defeating the purpose of the CI gate.

There are similar issues discussed in various places, but i cannot find anything on this specific one. The references i have found suggest this problem was resolved in this beta. Any help would be greatly appreciated.

sharwell commented 7 years ago

This is a new one for me!

:question: Do you have the following property in your project file?

<PropertyGroup>
  <Antlr4UseCSharpGenerator>True</Antlr4UseCSharpGenerator>
</PropertyGroup>

:question: Do all of your grammar files have different names?

❓ If possible, can you attach a portion of the build output containing the errors with a few lines of context?

KvanTTT commented 7 years ago

Seems to me this is a grammar warning with surprisingly trimmed ends. See CHARACTERS_COLLISION_IN_SET in grammars repo.

@psllc can you upload a full or fragment part of your lexers? I'm wondering about SET rule.

sharwell commented 7 years ago

Seems to me this is a grammar warning with surprisingly trimmed ends.

Nice, probably a mistake in the regular expression used to separate messages from paths in the build tooling.

ghost commented 7 years ago

i did not have Antlr4UseCSharpGenerator in the project; i added this and it did not help.

Yes, all my grammars have unique names. this all worked before moving to VS 2017

here is the build log:

buildlog.txt

i don';t know which of the six grammars are causing the problem... here are the lexer sections from all of them:

//
 // EXPRESSION.g4 Lexer Rules 
 //
TRUE                    :   ('T'|'t')('R'|'r')('U'|'u')('E'|'e');
FALSE                   :   ('F'|'f')('A'|'a')('L'|'l')('S'|'s')('E'|'e');
fragment SQSTRING : ( '\'' ( ~'\'' | '"' | '\'\'' )* '\'' );
fragment DQSTRING : ( '"' ( ~'"' | '""' )* '"' ) ;
STRING      :   SQSTRING | DQSTRING  ; 
GT              : '>';
LT              : '<';
QUESTION        : '?';
COLON           : ':';
EQUAL           : '=';
LE              : '<=';
GE              : '>=';
NOTEQUAL        : '<>';
AND             : 'and' ;
OR              : 'or';
XOR             : 'xor' ;
ADD             : '+';
SUB             : '-';
MUL             : '*';
DIV             : '/';
STRCONTAT       : '&';
EXPONENT        : '^';
MOD             : 'mod';
NOT             : 'not';
LBRAGE          : '{' ;
RBRACE          : '}' ;
FIELD_REFERENCE :   '[' ~('[' | ']' )* ']'  ;
FUNCNAME                :   ('a'..'z'|'A'..'Z'|'_')+          ;
LPAREN                  :   '('    ;
RPAREN                  :   ')'    ;
COMMA                   :   ','    ;
fragment FLOAT          :   INTEGER '.'  INTEGER     ;
fragment INTEGER        :   DIGIT DIGIT*      ;
fragment DIGIT          :   '0'..'9'    ;
DATE                    :   DIGIT DIGIT DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT  'T' ;
NUMBER                  :   FLOAT | INTEGER ; 
WS         :   ('\t' | '\r' | '\n' |'ï'|'»'|'¿'|' '|'\t'|'\r'|'\n')+ -> channel(HIDDEN) ;  

//
//FEIDLREFERENCE.g4  lexer rules
//
ALPHA_CHAR :   'a'..'z'|'A'..'Z'   ;
SPEC_CHAR  :    '-' | '_' ;
IDENTIFIER :   ALPHA_CHAR (INTEGER | ALPHA_CHAR | SPEC_CHAR )*    ;
LBRACKET   :   '['    ;
RBRACKET   :   ']'    ;
LPAREN     :   '('    ;
RPAREN     :   ')'    ;
INTEGER    :   DIGIT DIGIT*      ;
DIGIT      :  '0'..'9'    ;
DOT        :   '.';
SEMI       :   ';';
WS         :   ('\t' | '\r' | '\n' |'ï'|'»'|'¿'|' '|'\t'|'\r'|'\n')+ -> channel(HIDDEN) ;   

/*
  SEQUENCE.g4  Lexer Rules
 */  
TEXT   
    :   ~(','|';'|'\n'|'\r'|'"'|'{'|'}')+ 
    ;
fragment SQSTRING : ( '\'' ( ~'\'' | '"' | '\'\'' )* '\'' );
fragment DQSTRING : ( '"' ( ~'"' | '""' )* '"' ) ;
STRING      :   SQSTRING | DQSTRING  ; 
LBRACE: WS* '{' WS* ;
RBRACE : WS* '}' WS* ;
COMMA: WS* ',' WS* ;
WS         :   ('\t' | '\r' | '\n' |'ï'|'»'|'¿'|' '|'\t'|'\r'|'\n')+ -> channel(HIDDEN) ;  

//
// VALUESET.g4 lexer rules 
//
TEXT   : ~('|'|'\n'|'\r')+ ; 
VBAR: '|';

//
// VALUESETITEM.g4 lexer rules 
//
INT: WS* ( '0'..'9' )+ WS* ;
fragment COMMA: WS* ',' WS* ;
TEXT   : COMMA ~('\r'|'\n')+ ;
WS: ' ' -> skip ;
sharwell commented 7 years ago

this all worked before moving to VS 2017

I imagine what you are experiencing is a side effect of updating to a new version of the code generator, not a side effect of updating to 2017. Either way, we obviously need to get it fixed. I can think of two possible problems:

  1. The default encoding changed from the "current system encoding" (which varies) to UTF-8. I see from your example that you have characters that are likely impacted by this change; try saving your grammars as UTF-8 and see if the problem is resolved.
  2. The grammar is more strict about escape sequences than it used to be. If you use a \ character where it's not directly supported, it will now report an error (or maybe a warning).
ghost commented 7 years ago

Thank you for your time and help with this. I've removed the 'funky' characters from the whitespace lex rule and i've saved all the grammars as UTF-8 with no resolution. I've done a careful scan for misuse of the backslash but if it's there i'm missing it...

ghost commented 7 years ago

ok, so i looked again and i'm so embarrassed by what i just found... this rule:

WS : ('\t' | '\r' | '\n' |'ï'|'»'|'¿'|' '|'\t'|'\r'|'\n')+ -> channel(HIDDEN) ;

has the tab, return, and newline characters declared TWICE. that was causing the error...

i can't believe it took me that long to see this... granted, the error could have been a bit more helpful and perhaps there is a problem in what it's pulling or showing in the msg text, but still... i should have seen this earlier.

Thanks for your time Sam. Sorry to have wasted it. ;)

KvanTTT commented 7 years ago

The error text is quite helpful: "chars \"<arg>\" used multiple times in set <arg2>". There is a bug in error text displaying.

The grammar is more strict about escape sequences than it used to be.

Strict grammar is better than weak in my opinion :)

ghost commented 7 years ago

i agree the stricter grammar is better... but rather than the displayed error of:

"Unknown build error: "' used multiple times in set: SET" "Unknown build error: "' used multiple times in set: SET" "Unknown build error: "' used multiple times in set: SET"

it (should?) have shown:

"Unknown build error: "\t" used multiple times in set: SET" "Unknown build error: "\r" used multiple times in set: SET" "Unknown build error: "\n" used multiple times in set: SET"

sharwell commented 7 years ago

It should have shown the file and line number along with an error number too. 😄

KvanTTT commented 7 years ago

"Unknown build error: "' used multiple times in set: SET"

At least these warnings with escaped chars displayed correctly in the last ANTLR version: https://github.com/antlr/antlr4/blob/master/tool-testsuite/test/org/antlr/v4/test/tool/TestSymbolIssues.java#L399

Unfortunately, warning location cannot be displayed correctly for now: https://github.com/antlr/antlr4/blob/4.7/tool/src/org/antlr/v4/automata/ATNOptimizer.java#L106