Invalid regex in grammar: `source.hack` (in `syntaxes/hack.json`) contains a malformed regex (regex "`(?xi)([a-z_\x{7f}-\x{7fffffff}]`...": character value in \x{} or \o{} is too large (at offset 30))
... and ...
Invalid regex in grammar: `source.hack` (in `syntaxes/hack.json`) contains a malformed regex (regex "`(?i)[a-z_\x{7f}-\x{7fffffff}][a-`...": character value in \x{} or \o{} is too large (at offset 27))
The line numbers have been truncated. but they correspond to...
I suspect the intent here was to cover all unicode chars from 0x7F to the end, however 0x7FFFFFFF is no longer a valid UTF-8 unicode char. As of 2003, the max is 0x10FFFF.
In November 2003, UTF-8 was restricted by RFC 3629 to match the constraints of the UTF-16 character encoding: explicitly prohibiting code points corresponding to the high and low surrogate characters removed more than 3% of the three-byte sequences, and ending at U+10FFFF removed more than 48% of the four-byte sequences and all five- and six-byte sequences.
This PR addresses this by switching out \x{7fffffff} with \x{10ffff}.
As detailed in https://github.com/slackhq/vscode-hack/issues/78, https://github.com/slackhq/vscode-hack/pull/72 introduced another error picked up by our grammar compiler. This time it's an invalid unicode regex match:
... and ...
The line numbers have been truncated. but they correspond to...
https://github.com/slackhq/vscode-hack/blob/62329f6b026a75f805daf701071df45ba09330a5/syntaxes/hack.json#L910
... and ...
https://github.com/slackhq/vscode-hack/blob/62329f6b026a75f805daf701071df45ba09330a5/syntaxes/hack.json#L918
... respectively.
I suspect the intent here was to cover all unicode chars from
0x7F
to the end, however0x7FFFFFFF
is no longer a valid UTF-8 unicode char. As of 2003, the max is0x10FFFF
.From https://en.wikipedia.org/wiki/UTF-8#History:
This PR addresses this by switching out
\x{7fffffff}
with\x{10ffff}
.Fixes https://github.com/slackhq/vscode-hack/issues/78