microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
163.25k stars 28.88k forks source link

There are many strange regular expressions in tmLanguage.json in some extensions #108241

Closed Frederisk closed 3 years ago

Frederisk commented 4 years ago

I noticed something like this in some tmLanguage.json files(extensions\json\syntaxes\JSON.tmLanguage.json line:81-84):

 "number": {
    "match": "(?x)        # turn on extended mode\n  -?        # an optional minus\n  (?:\n    0       # a zero\n    |       # ...or...\n    [1-9]   # a 1-9 character\n    \\d*     # followed by zero or more digits\n  )\n  (?:\n    (?:\n      \\.    # a period\n      \\d+   # followed by one or more digits\n    )?\n    (?:\n      [eE]  # an e character\n      [+-]? # followed by an option +/-\n      \\d+   # followed by one or more digits\n    )?      # make exponent optional\n  )?        # make decimal portion optional",
    "name": "constant.numeric.json"
 },

I guess this is probably caused by not removing code comments during YAML conversion. But to my surprise, such code can work.

In other words, is such code allowed? The content between # and \n will become comments without effect, and many spaces will be interpreted as YAML indentation without effect.

At first I thought that this was only the content in a small number of files, but after I checked it, I found that such writing seems to be widespread. I don't know whether such problems should be fixed.

vscodebot[bot] commented 4 years ago

(Experimental duplicate detection) Thanks for submitting this issue. Please also check if it is already covered by an existing one, like:

aeschli commented 3 years ago

This is a Oniguruma Regexp. (?x) turns on the extended mode that ignores whitespace and comments,

Most TextMate grammars are authored in XML where these expressions are written on multiple lines. Our converter doesn't try to be smart but just puts it all in a single JSON strings