Non-optional rules with fully-optional children prevent additional rules from being collected

kaidjohnson commented 5 years ago

I am working on a grammar that has a handful of optional top-level rules. If I attempt to group a few of these optional rules together, for the convenience of listening/visiting, it changes the candidates collected by antlr4-c3.

Working Example:

grammar MyGrammar;

expression: GET FOO? BAR? BAZ? withQux? EOF ;

withQux: WITH QUX ; 

GET: 'get';
FOO: 'foo' ;
BAR: 'bar' ;
BAZ: 'baz' ;
WITH: 'with' ;
QUX: 'qux' ;

core.collectCandidates('get', 1); returns tokens ['foo', 'bar', 'baz', 'with qux']. This is the result I am expecting.

Non-working Example:

grammar MyGrammar;

expression: GET fooBarBaz withQux? EOF ;

fooBarBaz: FOO? BAR? BAZ? ;

withQux: WITH QUX ; 

GET: 'get';
FOO: 'foo' ;
BAR: 'bar' ;
BAZ: 'baz' ;
WITH: 'with' ;
QUX: 'qux' ;

core.collectCandidates('get', 1); returns ['foo', 'bar', 'baz'] but is unexpectedly missing with qux.

If I make fooBarBaz itself optional, fooBarBaz?, the compilation of the grammar throws a warning: rule 'expression' contains an optional block with at least one alternative that can match an empty string, which is expected given the creation of an optional rule with optional children.

As far as I can tell, the grammars are syntactically the same and I would expect them to return the same candidates.

qvt commented 4 years ago

I was wondering about the same issue. As a fix, I suggest to continue to interpret the remaining rule when returning from a referenced rule. This can be easily done by replacing ll. 266-268 with

ruleStack.push(ruleTransition.target.ruleIndex);
this.collectFollowSets(transition.target, stopState, followSets, seen, ruleStack);
ruleStack.pop();
this.collectFollowSets(ruleTransition.followState, stopState, followSets, seen, ruleStack);

Two limitations still remain:

The added call should only be done iff the previous call did not find any token, or only tokens inside a (...)*. A boolean return value of collectFollowSets could pass the information when a mandatory token was found. This requires to detect if a token is added while inside a StarLoopEntry and StarLoopBack state.
Only after finding the state of the caret in processRule, we start to collectFollowSets based on this state alone. By doing so we lose our context until that moment, particularly uninterpreted parts of rule expression which lead us to the caret, and which could add useful tokens to the follow set. Of course, there is definitely a trade-off between speed and exactness.

br0nstein commented 1 year ago

I'm working on the fix for this (and some related issues such as handling for an empty rule body at, below, or after the rule transitioned to at the caret position). But @mike-lischke can you provide some clarity on whether we intend to return the Epsilon token as a candidate when parsing is successful terminated at the caret (without any subsequent tokens being required)? The epsilon is returned in some cases (when a rule can be parsed at the caret) but not in others. For example the test case "Most simple setup" does not take this into account - it expects Epsilon is not a candidate token but var c = a is parseable as-is.

mike-lischke commented 1 year ago

The Token.EPSILON value is not really a token, but a mark for prediction (and hence also for code completion). As such it should not be returned, as it has no real value for the caller (what would you use it for?). If there's no candidate then the empty list says it all, no need to also check for EPSILON, right?

The fact that it is returned sometimes is probably just an oversight and if you can fix that, you are welcome to do so!

br0nstein commented 1 year ago

Understood. But to explain, the idea would be that if we return a list of tokens including the epsilon (or via a separate boolean property) we could tell the caller if one of the returned tokens is required - or if they are all optional. That said, they should already have that information from parsing in advance. And the situation becomes complicated taking into account rules and ignored tokens.

mike-lischke commented 1 year ago

Hmm, what if there's a mix of mandatory and optional tokens? It would be more useful if that information is available for each candidate. If you return a flag or the EPSILON token then the caller can only assume that all candidates are optional.

mike-lischke / antlr4-c3

Non-optional rules with fully-optional children prevent additional rules from being collected #23