mike-lischke / antlr4-c3

A grammar agnostic code completion engine for ANTLR4 based parsers
MIT License
408 stars 62 forks source link

Not getting expected rule candidates #58

Closed kyle-sawatsky closed 2 years ago

kyle-sawatsky commented 3 years ago

I've tried to cut down the grammar a bit for this but it's not particularly complex. So given the following grammar

/* ===== PARSER ===== */

statement: expression EOF ;

expression
    : lhs=expression OP_DOT rhs=binaryDotCompletion                                                    # binaryDotExpression
    | LPAREN expr=expression RPAREN                                                                             # parenExpression
    | funcCall                                                                                                  # funcCallExpression
    | op=(OP_MINUS|OP_NOT) expr=expression                                                                      # unaryExpression
    | lhs=expression op=(OP_MULTIPLY|OP_DIVIDE) rhs=expression                                                  # binaryExpression
    | lhs=expression op=(OP_PLUS|OP_MINUS) rhs=expression                                                       # binaryExpression
    | lhs=expression op=REL_OP rhs=expression                                                                   # binaryExpression
    | lhs=expression op=OP_AND rhs=expression                                                                   # binaryExpression
    | lhs=expression op=OP_OR rhs=expression                                                                    # binaryExpression
    | lhs=expression op=OP_IN rhs=array                                                                         # inConditionExpression
    | <assoc=right> lhs=expression OP_TERN_QUERY trueBranch=expression OP_TERN_COLON falseBranch=expression     # ternaryExpression
    | term                                                                                                      # termExpression
    ;

binaryDotCompletion: ID ;

funcCall: name=ID LPAREN (args+=funcArg (OP_COMMA args+=funcArg)*)? RPAREN ;

funcArg
    : lambdaExpression      # lambdaArg
    | expression            # expressionArg
    ;

lambdaExpression
    : LPAREN (args=ID (OP_COMMA args=ID)*)? RPAREN OP_ARROW body=expression
    | args=ID OP_ARROW body=expression
    ;

array: LBRACKET (expression (OP_COMMA expression)*) RBRACKET ;

term
    : BOOL
    | FLOAT
    | INT
    | LONG
    | identifier
    | STRING
    ;

  identifier: ID ;

/* ====== LEXER ===== */

REL_OP: OP_EQ | OP_GT | OP_GTE | OP_NEQ | OP_LT | OP_LTE;

OP_EQ: '==' ;
OP_GTE: '>=' ;
OP_GT: '>' ;
OP_NEQ: '!=' ;
OP_LTE: '<=' ;
OP_LT: '<' ;
OP_IN: '=IN=' ;

OP_DOT: '.' ;
OP_COMMA: ',' ;

OP_PLUS: '+' ;
OP_MINUS: '-' ;
OP_MULTIPLY: '*' ;
OP_DIVIDE: '/' ;

OP_ARROW: '=>' ;

OP_AND: '&&' ;
OP_OR: '||' ;
OP_NOT: '!' ;

OP_TERN_QUERY: '?' ;
OP_TERN_COLON: ':' ;

LPAREN: '(' ;
RPAREN: ')' ;

LBRACKET: '[' ;
RBRACKET: ']' ;

DOLLAR: '$' ;

APOSTROPHE: '\'' ;

STRING: APOSTROPHE (~(['] | '\\') | '\\' (APOSTROPHE | '\\'))* APOSTROPHE ;

fragment EXPONENT: 'e' (OP_PLUS | OP_MINUS)? DIGIT+ ;
fragment FLOAT_SUFFIX: 'f' | 'F' ;
fragment LONG_SUFFIX: 'l' | 'L' ;
FLOAT: (DIGIT+ '.' DIGIT+ EXPONENT? FLOAT_SUFFIX?)
     | (DIGIT+ EXPONENT FLOAT_SUFFIX?)
     | (DIGIT+ FLOAT_SUFFIX)
     | ('.' DIGIT+ EXPONENT? FLOAT_SUFFIX?) ;
LONG: DIGIT+ LONG_SUFFIX;
INT: DIGIT+ ;
BOOL: ('true')|('false') ;
ID: DOLLAR? '_'* ALPHA(ALPHA|DIGIT|'_')* ;

WHITESPACE : SPACE+ -> channel(HIDDEN) ;

Let's say I've got an example input like foo. and I'm setting my preferredRules with identifier and binaryDotCompletion. As I type foo I'm getting parse tree output like (statement (expression (term (identifier foo))) <EOF>) and the collected rules contains identifier as I would expect. But as soon as I add the . no rules are collected at all. The parse tree output becomes (statement (expression (expression (term (identifier foo))) . (binaryDotCompletion <missing ID>)) <EOF>) so the parser seems to know what should come next.

I verified that the token index being used for assessing foo. is 1

As soon as I add another letter after the . the rules candidates has binaryDotCompletion but not a single token candidate is collected anymore (I haven't set any to be ignored while debugging this so I end up getting a lot of operator tokens). The missing tokens isn't particularly important for my purposes but it seems weird.

The concept here looks really similar to the example grammar in the README with simpleExpression etc. so I'm baffled as to the results I'm getting.

kyle-sawatsky commented 3 years ago

Following up with a simpler test I did

statement: expression ;

expression: identifier OP_PLUS subIdentifier ;

subIdentifier: ID ;

OP_PLUS: '+' ;
DOLLAR: '$' ;
fragment ALPHA : [A-Za-z] ;
fragment DIGIT : [0-9] ;

ID: DOLLAR? '_'* ALPHA(ALPHA|DIGIT|'_')* ;

Preferred rules set to identifier and subIdentifier and I've got no rules showing up on input foo+ with the caret at the end.
I feel like I'm missing something critical on how this is supposed to work.

kyle-sawatsky commented 3 years ago

Apologies for the multiple posts. I've just read through the test cases for the simpleExpr grammar again and it appears that it doesn't suggest a rule until you actually start matching it, so it seems like I did have a slight misunderstanding. I thought I could get around this for something like a dot-accessor expression by providing a 'blank' rule that will always match, but that didn't seem to work either. I think I'm just really confused on how c3 decides when a rule is a candidate.

mike-lischke commented 3 years ago

For a suggestion a walk is necessary from the beginning of the grammar to a specific point (usually the caret position). The engine cannot find any suggestion without finding a valid path up to the given position, in the grammar.