tunnelvisionlabs / antlr4cs

The original, highly-optimized C# Target for ANTLR 4
Other
443 stars 104 forks source link

Erroneous extraneous input detected in C# (but not in Java) #71

Open cyril-- opened 10 years ago

cyril-- commented 10 years ago

Hello,

I'm afraid I found a bug in Antlr4 CS 4.3.0. Using the same grammar and the same input stream, the parsing succeeds in Java but fails in C# ("extraneous input"). The traces show that the C# version consumes all the left brackets in a row, and then complains about a right bracket. The Java version works as expected.

Here is my grammar:

grammar Expr;

root                        : assignment EOF
                            ;
assignment                  : LOCAL_VARIABLE '=' expression
                            ;
expression                  : logical_and_expression
                            ;
logical_and_expression      : relational_expression ('AND' relational_expression)*
                            ;
relational_expression       : primary_expression (('<'|'>') primary_expression)*
                            ;
primary_expression          : '(' + expression + ')'
                            | UNSIGNED_INT
                            | LOCAL_VARIABLE
                            ;

LOCAL_VARIABLE              : [_a-z][_a-zA-Z0-9]*
                            ;
UNSIGNED_INT                : ('0'|'1'..'9''0'..'9'*)
                            ;
WS                          : [ \t\r\n]+ -> skip
                            ; 

And my test program (C#), showing the input stream:

string expression = "b = (((a > 10)) AND ((a < 15)))";
AntlrInputStream input = new AntlrInputStream(expression);
ExprLexer lexer = new ExprLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExprParser parser = new ExprParser(tokens);
parser.Trace = true;
parser.root();

The Java version:

String expression = "b = (((a > 10)) AND ((a < 15)))";
ANTLRInputStream input = new ANTLRInputStream(expression);
ExprLexer lexer = new ExprLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExprParser parser = new ExprParser(tokens);
parser.setTrace(true);
parser.root();

Your help would be appreciated. Best regards,

Cyril.

sharwell commented 10 years ago

Yes, you found a bug.

You can also work around it by removing the + characters which appear to be unintentionally added to the primary_expression rule.

If you need it to work and also need those + characters, then use PredictionMode.LlExactAmbigDetection for now.

ExprParser parser = new ExprParser(tokens);
parser.Interpreter.PredictionMode = PredictionMode.LlExactAmbigDetection;
parser.root();
cyril-- commented 10 years ago

Oops you're right, the + characters are indeed unintended in this rule. Everything works as expected after I removed them. Thank you.