microsoft / vscode-textmate

A library that helps tokenize text using Text Mate grammars.
MIT License
571 stars 110 forks source link

Escaping a `]` inside a posix class `[[:\]:]` does funky stuff #165

Open RedCMD opened 2 years ago

RedCMD commented 2 years ago

The regex [[:upper:]] works as expected at matching only upper case letters. Normally if you missspell anything inside the posix class, the textmate engine will fail. Except for a few noteble expcetions; ], : and [: (only gonna focus on ] for now) [[:upp]er:]] will work with matching :], u], p], e] and r]. (regex: [:uper]\]). But if you escape the ] inside the posix class, then both square brackets acts as both the closing and opening brackets for the 2nd character class and as literary characters for the 1st class. Leaving the last ] competely out of the classes (and can be removed without error). [[:upp\]er:]] will match the same as above but also with the added [ and ]. :], u], p], e] r] [] and ]]. (regex:[\[:uper\]]\]). [[:\]:] will match [, : and ]. ~Placing a - before the last : causes textmate engine to fail [[:\]-:].~ (EDIT: \]-: is an illegal character range) Regex works as expected if either : is removed or moved one space away from their respected square bracket. (I have used a single \ instead of the required double \\ from json embeding)

RedCMD commented 2 years ago

Posix classes are some of the funkiest things in textmate. The below regex is allowed and will match all upper case letters aswell as a, b and c. image But if posix a value is inputted, the engine breaks: image

RedCMD commented 1 year ago

kkos has fixed this issue https://github.com/kkos/oniguruma/commit/d1cf59269006481294896fb855153857d8bec928 https://github.com/kkos/oniguruma/commit/9b9365d0524d1bf60c1fc963be13260ff6b798aa

update version of oniguruma? 1.5.1 => 6.9.6

tonco-miyazawa commented 10 months ago

This fix is included in oniguruma release 6.9.9 . https://github.com/kkos/oniguruma/commit/06f5c8198a3da4a8bdcc8c79d52034cd07bb96b7

Additional change

If a symbol appears between [: and :] , it becomes a normal character class. ( except [[:^:]] ) ex. [[:upper!:]] == [[\:upper!:]]

oniguruma/src/regparse.c#L5106-L5139


POSIX bracket now works ideally, thank you.

RedCMD commented 9 months ago

VSCode is still using an old 2020 version https://github.com/kkos/oniguruma/releases/tag/v6.9.5_rev1

https://github.com/microsoft/vscode/blob/7cbff1919e05c97cc95e06a32abbe7ab8d3f0fc3/src/vs/workbench/services/textMate/common/cgmanifest.json#L9 @hediet