jeff-hykin / better-cpp-syntax

💾 The source of VS Code's C++ syntax highlighting
GNU General Public License v3.0
154 stars 29 forks source link

Syntax Highlighter Crashing #637

Open Halalaluyafail3 opened 1 year ago

Halalaluyafail3 commented 1 year ago

Checklist

If Disabling that^ makes the problem go away, then follow this to make an issue on the C++ extension: https://github.com/microsoft/vscode-cpptools/issues/new/choose

The code with a problem is:

typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<typename::std::add_pointer<int>::type>::type>::type>::type>::type>::type>::type>::type>::type>::type>::type>::type>::type>::type>::type>::type>::type>::type>::type>::type ptr{nullptr};

It looks like:

This code is crashing Visual Studio Code when there are no plugins enabled (which I presume is the syntax highlighter). The Visual Studio Code repository linked here so I am reporting this crash here.

It should look like:

It should highlight this code normally without crashing.

9291Sam commented 1 year ago

Confirmed to be reproduceable on 1.79 with no extensions enabled.

jeff-hykin commented 1 year ago

I'd bet that this shares the same underlying issue as this https://github.com/jeff-hykin/better-cpp-syntax/issues/578#issuecomment-991083540

RedCMD commented 10 months ago

I had a little look into this

"#typename" is responsible for the highlighting in that code snippet and is definitely the cause of the lag the size of the regex is not the problem, but the number of recursive calls "#typename" makes back to itself image

"#typename" has some very weird nesting going on from testing; the number of calls it executes seems to grow at 3^n so from that code snippet, which has 20 nested levels it causes 3^20 = 3486784401 regex executions on the int token alone or sum_(n=0)^20 3^n = 5230176601 for the total number of calls to "#typename" image

#template_call_range gets called 3 times at every level, every iteration (hence the 3 in 3^n) #template_call_range calls #template_call_context which calls #storage_types and full circle back to #typename I suspect this is a remnant from https://github.com/microsoft/vscode-textmate/issues/208 image

I managed to simplify typename down; to try to understand it you can see that capture groups 2, 3 and 4 nest inside each other and that #template_call_range is called on all of them when only a single call is needed image

"test2": {
    "match": "([\\w:]++)(<\\g<0>>)*+",
    "captures": {
        "1": { "name": "keyword $1" },
        "2": { "patterns": [ { "include": "#template_call_range" } ] },
        "3": { "patterns": [ { "include": "#template_call_range" } ] },
        "4": { "patterns": [ { "include": "#template_call_range" } ] }
    }
},
"typename": {
    "match": "(?x)(?<_1>\\w++)(?<_2>(?<_3>(?>::)?+(?>\\w++(?<_4><(?>\\g<nest>|[^<>]++)*+>)?+::)*+)?+\\w++(?<nest><(?>\\g<nest>|[^<>]++)*+>)?+)",
    "comment": "(?x)(#1\ntypename)(#6\n(#12\n(?>::)?(?>[a-zA-Z0-9_]+(#14\n<(?>\\g<5>|[^<>]++)*>)?::)*+)?[a-zA-Z0-9_]+(#17\n<(?>\\g<5>|[^<>]++)*>)?)",
    "captures": {
        "1": { "name": "keyword $1" },
        "2": { "patterns": [ { "include": "#template_call_range" } ] },
        "3": { "patterns": [ { "include": "#template_call_range" } ] },
        "4": { "patterns": [ { "include": "#template_call_range" } ] }
    }
}

and heres a simplified version of the actual problem keyword<keyword<keyword<keyword<keyword<keyword<keyword<keyword<keyword<keyword<keyword<keyword>>>>>>>>>>>

"typename": {
    "match": "\\w+(((<\\g<0>>)))?",
    "captures": {
        "1": { "patterns": [ { "include": "#typename" } ] },
        "2": { "patterns": [ { "include": "#typename" } ] },
        "3": { "patterns": [ { "include": "#typename" } ] }
    }
},

image

the problem is that #typename ends up calling itself 3 times at every nested level, and each of those 3 calls, then call #typename 3 more times at every nested level spiraling out of control

which I guess is exactly the same as doing this: image typing aaaaaaaaaaaaaaaaaaaa will cause the complete and utter destruction of the vscode client thread

also remember simply having a "patterns" array inside a capture group incurs a medium performance hit however the "patterns" array for "begin"/"end" rules does not seem to have the issue

jeff-hykin commented 10 months ago

which I guess is exactly the same as doing this: image typing aaaaaaaaaaaaaaaaaaaa will cause the complete and utter destruction of the vscode client thread

Oh that is very interesting. And that is weird that ranges dont have the same performance hit.

jeff-hykin commented 3 months ago

as of right now this one still exists (even though the other performance one was fixed)