Closed jurgenvinju closed 2 years ago
@tvdstorm thanks for reporting.
()
symbol, we get the same parse error.()
symbol, we get a perfect parse treeAnd so the bug is related to the empty symbol and how it is processed.
()
with ""
has no effectsimplified test module: needs two if
's nested to trigger the bug, no less. more is same.
import IO;
layout WS = [\ \t\r\n]* ;
import ParseTree;
start syntax Stmt
= "if" "(" Expr ")" Stmt ""
| "{" "}"
;
syntax Expr = "x";
public str example = "if (x)
' if (x)
' {}";
void main() {
print(parse(#start[Stmt], example, |example:///|, allowAmbiguity=true));
}
""
and ()
nevertheless. don't understand yetOne subtlety that comes to mind: there are two consecutive empties in the example with the parse error.
There's also a difference in behavior depending on trailing white space or not:
module NestedIfs
import ParseTree;
import IO;
layout Layout = [\ \t\n]* !>> [\ \t\n];
start syntax Stmt
= "if" "(" "x" ")" Stmt "else" Stmt
| "if" "(" "x" ")" Stmt () !>> "else"
| "{" "}"
;
str bug1 = "if (x) if (x) if (x) {}";
str bug2 = "if (x) if (x) if (x) {} ";
void triggerBug() {
void doit(str src) {
try {
println("Parsing: `<src>`");
parse(#Stmt, src);
}
catch value e: {
println(e);
}
}
println("### BUG1 (no trailing whitespace)");
doit(bug1); // parse error
doit(bug1[7..]); // parse error
doit(bug1[14..]); // ok
println("\n### BUG2 (trailing whitespace)");
doit(bug2); // parse error
doit(bug2[7..]); // ok
doit(bug2[14..]); // ok
}
It would be very helpful to convert these examples to tests that we can include in our test suite.
See: 322750d86007f108c6021aff7790d5e099b0a6b9
Mmm yes but that is probably caused by not using the start nonterminal..
-- Jurgen Vinju http://jurgen.vinju.org CWI SWAT TU Eindhoven Swat.engineering BV
No, it isn't: the follow restriction on layout eats layout until the ()
.
Mmm.. It should also eat the layout beyond the ()
On Fri, 3 Dec 2021 at 17:44, Tijs van der Storm @.***> wrote:
No, it isn't: the follow restriction on layout eats layout until the ().
β You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/usethesource/rascal/issues/1542#issuecomment-985668127, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPF5F4XGMVTBQWNQTLTAZ3UPDXYFANCNFSM5JHDHSZA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Jurgen Vinju http://jurgen.vinju.org CWI SWAT TU Eindhoven Swat.engineering BV
but there isn't, because of the follow restriction on layout.
The bug has nothing to do with follow restrictions. Or in other words: there are more relevant test cases. The DanglingElse example is complex and definitely needs to be added to the test set. However, that example is too big to debug.
Here is a version that triggers the bug without follow restrictions:
module Tijs
import IO;
layout WS = [\ \t\r\n]* ;
import ParseTree;
start syntax Stmt
= "if" "(" Expr ")" Stmt ()
| "if" "(" Expr ")" Stmt "else" Stmt
| "{" "}"
;
syntax Expr = "x";
public str example = "if (x)
' if (x)
' {}";
void main() {
print(parse(#start[Stmt], example, |example:///|, allowAmbiguity=true));
}
()
resolves the bug, and no more parse errors seem present."fi"
before the ()
the bug also dissappearsIt is baffling we discover this bug now, since the Rascal grammar itself has such a rule:
ifThen: Label label "if" "(" {Expression ","}+ conditions ")" Statement!variableDeclaration!functionDeclaration thenStatement () !>> "else"
@arnoldlankamp two bugs in one day. still digging on this bug; but it's nice to remember all the good things you wrote :-) If you have a hunch, I'd appreciate it.
If we add a bogus constraint to the previous non-terminal before the ()
, then the bug dissappears!
module Tijs
import IO;
layout WS = [\ \t\r\n]* ;
import ParseTree;
start syntax Stmt
= "if" "(" Expr ")" Stmt!bla ()
| "if" "(" Expr ")" Stmt "else" Stmt
| "{" "}"
| bla: "bla"
;
syntax Expr = "x";
public str example = "if (x)
' if (x)
' {}";
void main() {
print(parse(#start[Stmt], example, |example:///|, allowAmbiguity=true));
}
@tvdstorm here's a workaround for the bug for you, for short term success.
This also explains why the Rascal grammar was not bothered by this bug, and this makes it possible that this particular bug has always been there.
It is also a possible hint for resolving the issue, since the code path that is triggered by having constraints on the non-terminal before the right-nullable one does not have the bug.
it does not matter which filter is put on the Stmt
symbol before the ()
(follow restriction, exclude, reserve, any condition is enough to "fix" the issue.
This issue looks similar to #1543 . Maybe the proposed fix for that one will also resolve this one. If it doesn't let me know and I'll have another look.
thanks @arnoldlankamp ; I'll try that tomorrow.
This issue likely has something to do with the prefix sharing optimization, in combination with right nullables. If you add the same restriction to the Stmt
in both the rule with and without the else
clause, you likely get the parse error back.
I'd expect using this piece of grammar produces the same parse error as the original example:
start syntax Stmt
= "if" "(" Expr ")" Stmt!bla ()
| "if" "(" Expr ")" Stmt!bla "else" Stmt
| "{" "}"
| bla: "bla"
;
Yes; you are correct. If the prefix is the same again, the parse error comes back.
I have my eyes on this line:
With consecutive nullables such as layout empty layout
, would this not abort too early?
@tvdstorm shall we put the failing DanglingElse test to @ignore while we try to fix this? Because now the other developments on the main branch are invisible until we fix the current bug.
With consecutive nullables such as
layout empty layout
, would this not abort too early?
This line is there to keep track of what we've already processed, to prevent duplicate results.
It's a bit hard to explain without a whiteboard unfortunately. This whole hidden-right-recursion thing is a bit tricky π .
This was my thinking:
propagatedIndexes
Definitely tricky stuff, but should we not mark the processed position by a
input position + rule position
key here?
Definitely tricky stuff, but should we not mark the processed position by a
input position + rule position
key here?
The instance of the AbstractStackNode
represents the rule position.
But I guess we may be dealing with the same issue as #1543 .
If we look at (line 666 of all places): https://github.com/usethesource/rascal/blob/bb72c03cc9b4004c42879c973e7b7f31e3b23f1f/src/org/rascalmpl/parser/gtd/stack/AbstractStackNode.java#L666 . We see the same fromIndex
optimization. If we replace that one by for(int i = edgesMapToAdd.size() - 1; i >= 0; --i){
as well, it may resolve the issue.
Thanks @arnoldlankamp; I'll look into that. Meanwhile I've been producing test cases to trigger all these issues, with some unexpected results so it may take a while to close this issue even if you've solved it. hold on...
Ok; the fix on line 666 does not break the parser runtime, but it also does not fix the current bug...
More tomorrow. I have a X-mas tree to set up now and cook some food for the children :-)
The problem with the test cases was that I accidentally reverted line 666 instead of 587 to trigger the other bug :-) since the code looked so similar I was misreading it for the wrong location. So that mystery is solved now. I can trigger these bugs in a reproducible manner.
The test cases would be very interesting too look at.
Also, I'm wondering what happens if we introduce a level of indirection by adding another start
symbol π€ . E.g.: start syntax Root = Stmt
.
It may be tripping up over the empty
at the end, in combination with the empty layout
before it; for which the AbstractStackNode
instances are both shared between the first and second match of in the "if" "(" Expr ")" Stmt ""
construction, since the empty
and empty layout
have the same start and end location in both occurrences. Their prefix has a different start location however and should be propagated all the way to the end. Maybe it's not doing so for some reason π€ ?
I think we can also reduce the test case to: S := "x" S "" "" | "y"
, with xxy
as input and still trigger the error. The smaller the test, the easier it is to debug π .
Just to clarify visually:
if L (x) L
if L (x) L
{}
L "" <-- (1) Same input location
L "" <-- (2) as these two
What should happen is the following:
if
)if
), so it should be added as prefix and be propagated to "" (2)if
), it should reduce it (since it's the last item in the production) and complete the parse.Ok @arnoldlankamp here's a simplified example which still exhibits the bug. I'll turn this into a test case today:
lexical Stmt
= "i" Stmt () () // if rule
| "i" Stmt () "e" Stmt // else rule
| "!" // bang
;
public str example = "ii!";
layout
from the example, but to trigger the bug I need to introduce an additional nullable in the if rule, and in the else rule to simulate the effect that layout used to have (the original layout rule was nullable)) "if"
and "else"
keywords: no effect"i!"
: this produces a parse tree instead of an error, so reverted that"() ()"
; this produces a parse tree instead of an error, so reverted thatStmt
symbol in the if rule (added a constraint): this produces a parse tree instead of an error, so reverted that. This confirms the same bug is still represented by the current parse error.So to trigger this particular bug:
See test/org/rascalmpl/test/parser/DoubleRightNullableWithPrefixSharing.java
. It has a main
method which triggers the bug
Interestingly, the prefixes don't seem to be shared between the if
and if else
alternatives.
The generated stack nodes have different ids.
These two (43 and 49):
https://github.com/usethesource/rascal/blob/9d1a82bd87a256d5b56c834b17cb7c1bbfa94fe3/test/org/rascalmpl/test/parser/DoubleRightNullableWithPrefixSharing.java#L78
https://github.com/usethesource/rascal/blob/9d1a82bd87a256d5b56c834b17cb7c1bbfa94fe3/test/org/rascalmpl/test/parser/DoubleRightNullableWithPrefixSharing.java#L87
When I have some time I'll make a checkout of the project and step through it with a debugger to see what's happening.
Ok. mmmm.. maybe one of the alternatives is reducing too early? before it can be propagated?
The id's were generated from the grammar positions. So every symbol position in every rule gets a unique id. Maybe prefix sharing is done on different grounds than just the id.
It's either not being reduced or expanded or recognized. We just need to figure out which and why π€ .
Shared prefixes need to be calculated in the generator in order to be picked up by the parser. The parser just looks at the id
's of the stack nodes (id
+ start pos = unique stack node instance).
This is an example of a test that applies prefix sharing: https://github.com/usethesource/rascal/blob/9d1a82bd87a256d5b56c834b17cb7c1bbfa94fe3/test/org/rascalmpl/test/parser/AmbiguousRecursivePrefixShared.java#L54-L55
Prefix sharing is an optimization, so the parser should function correctly if you don't use them (although using them in this specific case might make the bug disappear as a side effect).
That is a lot of opportunity for speed that we are not using at all! That's worth it's own issue report for the parser generator.
The current issue has many similarities with an original bug in Tomita's GLR. Farshi and Rekers both fixed this by an extra search stage to find the missing reductions for hidden right nulled rules. Scott and Johnstone solved it more elegantly by reducing all right nulled rules "early" so that an additional search is not needed with their RNGLR version. Rob and I made an Scannerless and disambiguating version too. Maybe we can use this fix here too once we confirm the behavior of missing reductions?
Yes, it does looks similar to the original GLR issue and the bug is most likely related to the right nullables, however the parser should be able to handle those without issue and in many examples I've tested in the past it actually does (I should have added those to the test suite back then π€¦ββοΈ). The question we are left with is: why doesn't it work in this specific case π€ ?
The right nullable fix in this parser's algorithm is rather trivial; just propagate the missing prefixes forward and reduce if you add a prefix for a 'new' start position to the last node of the alternative. Unfortunately in the current implementation of the parser things have become a bit complicated in places due to all the optimizations. Which is probably wherein to problem lies.
The parser's algorithm itself is also fairly straightforward if you disregard all the complexity the optimizations add. It's probably easiest to imagine it as an Earley parser with a the above described RN fix, sort of.
Hopefully I'll have some time to look at it on Sunday. I'll let you know once I've found the source of the issue.
Thanks Arnold. The algorithm is unclear to me indeed, so your help is appreciated!
That is a lot of opportunity for speed that we are not using at all! That's worth it's own issue report for the parser generator.
Ah, it seems I added a preprocessor to handle the sharing just before I left, so the generator doesn't need to calculate this.
This thing: https://github.com/usethesource/rascal/blob/master/src/org/rascalmpl/parser/gtd/preprocessing/ExpectBuilder.java .
It may even be the source of this bug π€ .
It indeed seems to be related to prefix sharing. If I disable this functionality the DoubleRightNullableWithPrefixSharing
test completes successfully.
Similar to removing the "else" rule indeed. The sharing is onder of the crucial factors to triggering the bug
Changing this line to false
resolves this issue for now: https://github.com/usethesource/rascal/blob/be4edf25e697c5ccc4f5d7ee35b5552665042897/src/org/rascalmpl/parser/gtd/preprocessing/ExpectBuilder.java#L67 .
I'll dig a little deeper to find out where it goes wrong.
Describe the bug
This gives a parse error, so when the else branch is not there it does not parse. If you add the else branch to the example, then a parse tree is produced.