Closed kaidjohnson closed 1 year ago
I was wondering about the same issue. As a fix, I suggest to continue to interpret the remaining rule when returning from a referenced rule. This can be easily done by replacing ll. 266-268 with
ruleStack.push(ruleTransition.target.ruleIndex);
this.collectFollowSets(transition.target, stopState, followSets, seen, ruleStack);
ruleStack.pop();
this.collectFollowSets(ruleTransition.followState, stopState, followSets, seen, ruleStack);
Two limitations still remain:
(...)*
. A boolean return value of collectFollowSets
could pass the information when a mandatory token was found. This requires to detect if a token is added while inside a StarLoopEntry
and StarLoopBack
state.processRule
, we start to collectFollowSets
based on this state alone. By doing so we lose our context until that moment, particularly uninterpreted parts of rule expression which lead us to the caret, and which could add useful tokens to the follow set. Of course, there is definitely a trade-off between speed and exactness.I'm working on the fix for this (and some related issues such as handling for an empty rule body at, below, or after the rule transitioned to at the caret position). But @mike-lischke can you provide some clarity on whether we intend to return the Epsilon token as a candidate when parsing is successful terminated at the caret (without any subsequent tokens being required)? The epsilon is returned in some cases (when a rule can be parsed at the caret) but not in others. For example the test case "Most simple setup" does not take this into account - it expects Epsilon is not a candidate token but var c = a
is parseable as-is.
The Token.EPSILON
value is not really a token, but a mark for prediction (and hence also for code completion). As such it should not be returned, as it has no real value for the caller (what would you use it for?). If there's no candidate then the empty list says it all, no need to also check for EPSILON, right?
The fact that it is returned sometimes is probably just an oversight and if you can fix that, you are welcome to do so!
Understood. But to explain, the idea would be that if we return a list of tokens including the epsilon (or via a separate boolean property) we could tell the caller if one of the returned tokens is required - or if they are all optional. That said, they should already have that information from parsing in advance. And the situation becomes complicated taking into account rules and ignored tokens.
Hmm, what if there's a mix of mandatory and optional tokens? It would be more useful if that information is available for each candidate. If you return a flag or the EPSILON token then the caller can only assume that all candidates are optional.
I am working on a grammar that has a handful of optional top-level rules. If I attempt to group a few of these optional rules together, for the convenience of listening/visiting, it changes the candidates collected by antlr4-c3.
Working Example:
core.collectCandidates('get', 1);
returns tokens['foo', 'bar', 'baz', 'with qux']
. This is the result I am expecting.Non-working Example:
core.collectCandidates('get', 1);
returns['foo', 'bar', 'baz']
but is unexpectedly missingwith qux
.If I make
fooBarBaz
itself optional,fooBarBaz?
, the compilation of the grammar throws a warning:rule 'expression' contains an optional block with at least one alternative that can match an empty string
, which is expected given the creation of an optional rule with optional children.As far as I can tell, the grammars are syntactically the same and I would expect them to return the same candidates.