microsoft / vscode-textmate

A library that helps tokenize text using Text Mate grammars.
MIT License
582 stars 115 forks source link

Grammer Endless Loop detector triggers inappropriately #145

Closed msftrncs closed 3 years ago

msftrncs commented 3 years ago

Trying to improve an IEC 61131 language grammar, I ran across a construct that incorrectly triggers the '[2] - Grammar is in an endless loop - Grammar pushed the same rule without advancing' debug.

The fact is that the grammar was NOT in an endless loop, but did push the same rule twice in a row, for completely different parts of the document.

The triggering of the debug is not dependent on the rule occurring twice on the same place in the document, only that the last rule on the stack is the same rule, and the rule did not capture any content (EDIT) due to both being an empty match and other rules having matched the document immediately in front of it.

CASE hello OF
    3:
        CASE hello OF
            4:
        END_CASE
END_CASE

In this example, the same rule is applied to both CASE keywords using a positive lookahead, thus the cursor doesn't advance when the rule triggers again for the second one. There are no other worthwhile rules to trigger within the statements, though rules do trigger for the expression and the OF, then again for the numeric evaluator and the :, but no other push rules to cover the stack. (EDIT) There is a rule in this grammar which has scoped the spaces preceeding the CASE and that is dependency in this issue.

The trigger only utilizes !hasAdvanced (which the begin rule did not meet because its a positive lookahead) and that the matching ruleID matches the previously stacked ruleId. I think the use of !hasAdvanced is incorrect here, as it only indicates if the current rule match moved, and does not indicate if the anchorPos has moved since the previous push of the same rule.

Interestingly, if one thinks about this, it would be possible to detect a multi-rule non-advancing loop by using the anchorPos values, by comparing the ruleId of all the rules on the stack that have the same non -1 value, if more than 1 rule on the stack have the same anchorPos (not -1) and ruleId then there has been a non-advancing loop. Maybe this loop detection can be written as a method similar to hasSameRuleAs method, since it would be needed for both BeginEnd and BeginWhile rules.

just a snippet of the grammar file:

{
    "begin": "\\b(?=CASE\\b)",
    "end": "\\b(?:END_CASE\\b|(?=END_(?:PROGRAM|INTERFACE|FUNCTION(?:_BLOCK)?|METHOD|PROPERTY|ACTION)\\b))",
    "endCaptures": {
        "0": {
            "name": "keyword.control.case.end.st"
        }
    },
    "name": "meta.switch.st",
    "patterns": [
        {
            "contentName": "meta.switch.expression.st",
            "begin": "\\G(?:CASE)",
            "beginCaptures": {
                "0": {
                    "name": "keyword.control.case.st"
                }
            },
            "end": "\\b(?:OF\\b|(?=END_(?:CASE|PROGRAM|INTERFACE|FUNCTION(?:_BLOCK)?|METHOD|PROPERTY|ACTION)\\b))",
            "endCaptures": {
                "0": {
                    "name": "keyword.control.case.of.st"
                }
            },
            "patterns": [
                {
                    "include": "$self"
                }
            ]
        },
        {
            "name": "punctuation.separator.st",
            "match": ":(?!=)"
        },
        {
            "include": "$self"
        }
    ]
},
{
    "begin": "^(?= )",
    "end": "(?=[^ ])",
    "name": "meta.leading-space",
    "patterns": [
        {
            "captures": {
                "1": {
                    "name": "meta.odd-tab.spaces"
                },
                "2": {
                    "name": "meta.even-tab.spaces"
                }
            },
            "match": "(  )(  )?"
        }
    ]
},
msftrncs commented 3 years ago

Would the following, as a method on StackElement seem acceptable? I'll roll this in to a PR if there isn't some reason this cannot be used. Its working in the case described above allowing the successive matches on the same rule because they are on different lines, while making an intentional loop with a single rule properly triggers the debug.

public hasEndlessLoop(anchorPosition: number): boolean {
    if (anchorPosition === -1) {
        // method is unavailable
        return true;
    }

    let se = this as StackElement;

    while (anchorPosition === se._anchorPos && se.parent !== nul) {
        se = se.parent;
    if (this.hasSameRuleAs(se)) {
        // endless loop detected
        return true;
        }
    }
    return false;
}
msftrncs commented 3 years ago

While working on testing scenarios, I determined I missed a detail as to reproduction. I'll edit the original post to include this:

The issue is dependent on another BeginEnd rule ending immediately before the matching of the rule that is triggering the endless loop debug. In this case, the language has odd/even spaces rules that scope out odd and even spaces, and the rules end (?=[^ ]) so it ends in front of the CASE, which causes there to be no cursor movement before the positive lookahead match.

msftrncs commented 3 years ago

To more generically describe the issue (and to better match test patterns I will include in a PR), assume the following document:

abc
test this line
not
    not
        not

With the grammar below (extended from the First-Mate test suite), three patterns of possible endless loops is demonstrated; 1 currently caught endless loop (base pattern), 1 multiple-rule endless loop not currently caught (test), and 1 rule incorrectly caught as an endless loop (not_a_problem).

{
    "name": "infinite-loop-grammar",
    "scopeName": "source.infinite-loop",
    "patterns": [
        {
            "name": "start",
            "begin": "\\A",
            "end": "$",
            "patterns": [
                {
                    "name": "negative-look-ahead",
                    "match": "(?!a)"
                }
            ]
        },
        {
            "include": "#test"
        },
        {
            "include": "#not_a_problem"
        }
    ],
    "repository": {
        "test": {
            "name": "test",
            "begin": "(?=test)",
            "end": "$",
            "patterns": [
                {
                    "include": "#test_this"
                }
            ]
        },
        "test_this": {
            "name": "test_this",
            "begin": "(?=test this)",
            "end": "$",
            "patterns": [
                {
                    "include": "#test_this_line"
                }
            ]
        },
        "test_this_line": {
            "name": "test_this_line",
            "begin": "(?=test this line)",
            "end": "$",
            "patterns": [
                {
                    "include": "#test"
                }
            ]
        },
        "spaces":
        {
            "name": "spaces",
            "begin": "^(?=\\s)",
            "end": "(?=\\S)"
        },
        "not_a_problem":
        {
            "name": "not_a_problem",
            "begin": "(?=not)",
            "end": "\\z",
            "patterns": [
                {
                    "name": "not",
                    "match": "\\Gnot"
                },
                {
                    "include": "#not_a_problem"
                },
                {
                    "include": "#spaces"
                }
            ]
        }
    }
}