I noticed a problem with lezer-generator taking a several minutes to report an "Unterminated string literal" error when reading a simple grammar containing an incorrectly terminated string. Profiling the code showed that the regexes used in the parse.tsnext method were subject to exponential backtracking when the grammar file contains backslash characters later in the file.
Specifically the line let end = this.match(start + 1, /^(\\.|[^'])*'/) at parse.ts#L69 can match a backslash on both sides of the alternation operator, triggering the backtracking. The solution would be to change the regex to /^(\\.|[^'\\])*/ to exclude the backslash from the right hand side. There are three cases in the parse.ts next method: the unterminated strings " and ', and the unterminated character set [ case which have this problem and solution.
Here's a simple test case:
$ time node dist/lezer-generator.cjs stringtest.grammar -o stringtest.js
Unterminated string literal (stringtest.grammar 2:9)
real 2m27.683s
user 2m27.674s
sys 0m0.005s
@tokens {
foo { ' } // unterminated string
}
@top Program {
Foo
}
Foo { foo "a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n"} // each \ triggers regex backtracking
After changing the regex as described above
$ time node dist/lezer-generator.cjs stringtest.grammar -o stringtest.js
Unterminated string literal (stringtest.grammar 2:9)
real 0m0.064s
user 0m0.045s
sys 0m0.021s
I noticed a problem with
lezer-generator
taking a several minutes to report an "Unterminated string literal" error when reading a simple grammar containing an incorrectly terminated string. Profiling the code showed that the regexes used in theparse.ts
next method were subject to exponential backtracking when the grammar file contains backslash characters later in the file.Specifically the line
let end = this.match(start + 1, /^(\\.|[^'])*'/)
at parse.ts#L69 can match a backslash on both sides of the alternation operator, triggering the backtracking. The solution would be to change the regex to/^(\\.|[^'\\])*/
to exclude the backslash from the right hand side. There are three cases in the parse.tsnext
method: the unterminated strings"
and'
, and the unterminated character set[
case which have this problem and solution.Here's a simple test case:
After changing the regex as described above