microsoft / vscode-textmate

A library that helps tokenize text using Text Mate grammars.
MIT License
584 stars 115 forks source link

`\\G` matches unicode character 0xFFFF `￿` #232

Open RedCMD opened 5 months ago

RedCMD commented 5 months ago

create a grammar with a rule containing \\G

"match": "\\G",
"name": "invalid"

run the grammar on a file containing the unicode character 0xFFFF ￿

expected: it doesn't match anything, as there are no \\G anchors available

actual: it matches the unicode char 0xFFFF

image abc￿def

the char does actually pop up in multiple files inside VSCode so this is not a non-issue issue image

senyai commented 5 months ago

As far as I understand, it is because of this logic: https://github.com/microsoft/vscode-textmate/blob/09effd8b7429b71010e0fa34ea2e16e622692946/src/rule.ts#L692-L696. Can it be done smarter? Not sure.