marko-js / marko-tmbundle

TextMate bundle for the Marko templating language
MIT License
20 stars 5 forks source link

tmbundle: Make tmLanguage compatible with PCRE2 #19

Open lildude opened 2 years ago

lildude commented 2 years ago

👋 I'm the lead maintainer of the https://github.com/github/linguist library which is used for language detection and providing the syntax highlighting for languages on GitHub.com, and we use this grammar.

Our grammar compiler has found several problems with your grammar which I thought I'd let you know about.

These regexes have quite a few problems as you can see in the regex101 link after each:

https://github.com/marko-js/marko-tmbundle/blob/60ded48bea6d6eccea81e858168e20e0b1b78b01/syntaxes/marko.tmLanguage.json#L779

https://regex101.com/r/pSG73T/1

https://github.com/marko-js/marko-tmbundle/blob/60ded48bea6d6eccea81e858168e20e0b1b78b01/syntaxes/marko.tmLanguage.json#L108

... and repeated again at:

https://github.com/marko-js/marko-tmbundle/blob/60ded48bea6d6eccea81e858168e20e0b1b78b01/syntaxes/marko.tmLanguage.json#L130

https://regex101.com/r/NlVs41/1

These are the errors our compiler reported:

DylanPiercey commented 2 years ago

Will look shortly. Thanks for the report!

DylanPiercey commented 2 years ago

@lildude from what I can tell the regex functions correctly. The validator you are using is using PCRE2 however my understanding was that the regex's in tmgrammars are intended to be handled by oniguruma. Is this not the case for linguist?

lildude commented 2 years ago

Is this not the case for linguist?

No. GitHub uses PCRE for grammar parsing for performance reasons.