microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
162.98k stars 28.78k forks source link

Regex Find can cause VS Code to Freeze #30874

Open Jason3S opened 7 years ago

Jason3S commented 7 years ago

Steps to Reproduce:

  1. code --disable-extensions
  2. File->New File
  3. Paste file from here: snippets.py (it happens with other files, but this is a simple example)
  4. Edit->Find """(.*?\n?)+?""" regex
  5. Try to remove the last ? from the regex.
  6. The editor freezes at this point
  7. Code Helper goes to 100% CPU.

Result: image

Reproduces without extensions: Yes

Jason3S commented 7 years ago

It also happens with: 1.15.0-insider

Jason3S commented 7 years ago

I close it by accident. This is a real issue.

rebornix commented 7 years ago

Really good catch! Unfortunately search viewlet shares the same issue.

cc @roblourens

rebornix commented 7 years ago

The reason for the hang is the regex """(.*?\n?)+?""" has Catastrophic Backtracking, you can run into the same hang/100% CPU when executing this regex agains that snippet.py file in any browser.

We should have some timeout here to avoid the hang however.

roblourens commented 7 years ago

@rebornix it probably only impacts the search viewlet when the file is open, right? I think Rust's regex engine is not susceptible to this. So it would be fixed by any fix to the editor.

rebornix commented 7 years ago

@roblourens yes exactly as you said. It should just be a fix in the editor side.

Spown commented 6 years ago

Another case:

Steps:

  1. Have a file with content
    t({phrase: 'mail_text_q', locale: 'ru-RU'}, mailCtx)+'\n'+
    t({phrase: 'mail_text_m', locale: 'ru-RU'}, mailCtx)+'\n'+
  2. call search widget
  3. set to RegEx
  4. input (.*+'\n'+
  5. try to add ) after the third character
    (.*
    ^
pascallaprade-beslogic commented 6 years ago

I have another case of a regex search that hangs VS Code: /\*(.*\n?)*\*/.

This is on VS Code 1.21.1, on Windows 10, both x64.

dawid2193487 commented 6 years ago

I found an another one that hangs. (.*)\(\d*\

I attempted to turn a rather large HTML file (577KB) into a CSV containing only data from that file. I can't share that file due to privacy reasons.

  1. Press CTRL+H for search and replace
  2. Enable regex and type in (.*)\(\d*\ and possibly punch in another space after it

There's about 25% CPU activity after it hangs, but I've been waiting a couple of minutes now. A regex shouldn't ever hang the whole editor.

Does this issue occur when all extensions are disabled?: Yes

stefanJi commented 5 years ago

Input order:

  1. input \s19:.*
  2. insert . to start
  3. vscode freezed
stefanJi commented 5 years ago

Input order:

  1. input \s19:.*
  2. insert . to start
  3. vscode freezed

Vscod Version: 1.31.1 Os: mac 10.14.3

jtakalai commented 5 years ago

Curious that it's such a problem since other editors like Sublime don't have a problem with highlighting regex matches... Why is newline such an issue? Shouldn't it be just another character?

hobpet commented 5 years ago

Curious that it's such a problem since other editors like Sublime don't have a problem with highlighting regex matches... Why is newline such an issue? Shouldn't it be just another character?

Actually I've tested my search in Sublime and there I have the following messages (but no crash!):

ERROR: Regex exhausted stack searching file ERROR: Regex complexity too high searching file

emilygdavis commented 4 years ago

I am running into this, too, for Find in a single file (not the global find/replace). I'm encountering the issue simply by using a pipe character (|) in my RegEx, when I have not yet populated characters after the pipe character. If I try enter something like

element|

(eventually intending to type element|getElement), VSCode freezes up, before I've even hit "Enter" to do the Find.

This is on VSCode version 1.39.2, on Max OSX Mojave (v10.14.6).

Fadavvi commented 4 years ago

Version: 1.41.1 (user setup) Commit: 26076a4de974ead31f97692a0d32f90d735645c0 Date: 2019-12-18T14:58:56.166Z Electron: 6.1.5 Chrome: 76.0.3809.146 Node.js: 12.4.0 V8: 7.6.303.31-electron.0 OS: Windows_NT x64 10.0.18363

and still

VSCodeProblem

roblourens commented 4 years ago

@Fadavvi please open a new issue, I can't repro it. I'm curious whether it happens without the "whole word" toggle enabled, or in a shorter file.

Fadavvi commented 4 years ago

@Fadavvi please open a new issue, I can't repro it. I'm curious whether it happens without the "whole word" toggle enabled, or in a shorter file.

OK. I'll open a new issue. It happen only on some special files. I can't find similarity of files!

jmlowenthal commented 4 years ago

I am running into this, too, for Find in a single file (not the global find/replace). I'm encountering the issue simply by using a pipe character (|) in my RegEx, when I have not yet populated characters after the pipe character. If I try enter something like

element|

(eventually intending to type element|getElement), VSCode freezes up, before I've even hit "Enter" to do the Find.

This is on VSCode version 1.39.2, on Max OSX Mojave (v10.14.6).

+1 to this on VSCode 1.45.0 on Linux x64 5.6.0-1-amd64 (Debian)

Artemis21 commented 4 years ago

Very simple regex to reproduce: (x|)

jmlowenthal commented 4 years ago

Looks to be fixed in 1.45.1 for me. @Artemis21 can you repro in 1.45.1?

ndbroadbent commented 4 years ago

I'm using VS Code 1.45.1, and I've just found another regex that will cause VS Code to freeze: "([^]+\n)+"

I've reproduced this with some minimal examples: https://gist.github.com/ndbroadbent/bb5e246b05488296576ee3dc39d40e38

(I've tested these with code --disable-extensions to disable all extensions.)

One example freezes my computer for about 1-2 seconds. The other example just has a few extra lines, but it looks like it causes a huge exponential increase in complexity and freezes VS Code forever.

It would be a really good idea to add a simple timeout so that we don't accidentally freeze the editor.

These error messages from Sublime would be even better:

ERROR: Regex exhausted stack searching file
ERROR: Regex complexity too high searching file

Thanks!

IllusionMH commented 3 years ago

This https://twitter.com/schuay/status/1325715124930420736 looks promising for current case, but wonder if it will be applicable.

Details in Chromium bug tracker: Issue 10765: Tracking bug: Non-backtracking regexp execution

tombogle commented 3 years ago

I wrote a fairly complex regex specific to my data, and also got this hang. Even if the regex is ill-formed or just really nasty to where it executes inefficiently, there needs to be a way to cancel the find (which kicks off automatically in the background as you're entering the search string). There is a red X that seems to be offering this. But clicking it doesn't seem to work. It looks as though it briefly pauses the background search, but then it goes on.

gustavoteixeirah commented 2 years ago

I am having this issue when applying the following regex: (added )+(\w+)+( output/)+(\w+)+(.png) in a file that contains 2000 lines with thius content:

added Qmb46Sdqb8JjMEGMXbQa2QTDukEz8Sa5jTLRFYqqxAxqn4 C:\\repositories\\kryptodevelopers\\output\\chunk1/995.png
added QmVXeSf9Matf4FCNkDcAFpUDQNUC4sDKnoJH3KJWUZTTWM C:\\repositories\\kryptodevelopers\\output\\chunk1/996.png
added QmQznUdyBTwMrn1ZhAGz7nK7mQVEKy7TDFT2aycDKEgLNb C:\\repositories\\kryptodevelopers\\output\\chunk1/997.png
added QmWx7ZtP7Gm4ASe7J4HVNQqmXRCuyvU6qNkBC77x59dVtA C:\\repositories\\kryptodevelopers\\output\\chunk1/998.png
added QmcFWFLo1ysNhKQeGBRUrhkmdHQ1heozGR1G4WSczGGLbV C:\\repositories\\kryptodevelopers\\output\\chunk1/999.png

These are the versions: image

Pressing CTRL+F and inputing the regex, freezes the IDE and the only way to get it back to work is restarting it.

IllusionMH commented 2 years ago

Because you shouldn't use (\w+)+ which is explained above, leave one + (think inside of parentheses).

awidjaja commented 2 years ago

Another case:

Steps:

  1. Have a file with content
t({phrase: 'mail_text_q', locale: 'ru-RU'}, mailCtx)+'\n'+
t({phrase: 'mail_text_m', locale: 'ru-RU'}, mailCtx)+'\n'+
  1. call search widget
  2. set to RegEx
  3. input (.*+'\n'+
  4. try to add ) after the third character
(.*
   ^

I wonder if after 3 years there is a solution for the possessive operator like .*+ ?

Jason3S commented 2 years ago

@awidjaja, Please try your expression on a site like https://regex101.com/ . If you want to match against ( or +, they need to be escaped.

Like this:

\(.*?\)\+'\\n'\+
awidjaja commented 2 years ago

Hi @Jason3S , vscode crash each time a multi line search fails to find a match, possibly because of regex backtracks hell.

Following the example from the included link, if there is no matching <body> tag, vscode will stop functioning and you have to close it and manually cancel vscode process in task manager, risking your un-saved work to be lost.

How Possessive Quantifiers Work Like a greedy quantifier, a possessive quantifier repeats the token as many times as possible. Unlike a greedy quantifier, it does not give up matches as the engine backtracks. With a possessive quantifier, the deal is all or nothing. You can make a quantifier possessive by placing an extra + after it. is greedy, ? is lazy, and *+ is possessive. ++, ?+ and {n,m}+ are all possessive as well. Possessive quantifiers are supported by JGsoft, Java, and PCRE.

Possessive quantifiers allow your regex to fail faster. In the above example, when the closing quote fails to match. Technically, possessive quantifiers are a notational convenience to place an atomic group around a single quantifier. All regex flavors that support possessive quantifiers also support atomic grouping. But not all regex flavors that support atomic grouping support possessive quantifiers. With those flavors, you can achieve the exact same results using an atomic group. Atomic grouping is supported by most modern regular expression flavors, including the JGsoft flavor, Java, PCRE, .NET, Perl, Boost, and Ruby. Most of these also support possessive quantifiers, which are essentially a notational convenience for atomic grouping.

Currently, both atomic group (?>X*) and possessive identifier X*+ doesn't work in vscode search.

I was hoping there is already a backlog to make the multi line search really work without causing vscode to crash after 3 years had gone. Hope this clarifies.

Jason3S commented 2 years ago

@awidjaja, That would be a feature request. It might be worth it to open a new feature request and mention that it could be a solution to this issue.

My guess is that VS Code uses the built in RegExp Engine from JavaScript, it does not support possessive qualifiers. One suggestion would be to use the oniguruma engine that VS Code uses to colorize code.

awidjaja commented 2 years ago

Indeed, javascript has not support both possessive quantifier and atomic grouping.

I thought vscode regex was based on PCRE or ripgrep engine?

I tested some regex using named group capture that is supported by javascript. vscode allows it but there is no way to reference the group in the replace box, neither ${name} nor \k work. Are you sure the regex was based on javascript?

IllusionMH commented 2 years ago

Editor's content stored in JS and therefore JS(V8) regexp engine is used for search in editor.

If you use Find in files - ripgrep (PCRE2 compliant) is used for unopened files (opened files currently always use JS engine).

As for replace - correct replace placeholder syntax would be $<name> but it might be just escaped before passing to replace and should be investigated separately.

\k<name> is used only to reference match inside of regexp itself. (?<start>\d+).+\k<start> will find 123 asd 123 in 123 asd 123 asd as expected

awidjaja commented 2 years ago

Editor's content stored in JS and therefore JS(V8) regexp engine is used for search in editor.

If you use Find in files - ripgrep (PCRE2 compliant) is used for unopened files (opened files currently always use JS engine).

As for replace - correct replace placeholder syntax would be $<name> but it might be just escaped before passing to replace and should be investigated separately.

\k<name> is used only to reference match inside of regexp itself. (?<start>\d+).+\k<start> will find 123 asd 123 in 123 asd 123 asd as expected

@IllusionMH thanks, Andrii for the clear information that is very difficult to get elsewhere. I guess the ball is with javascript engine to catch up with other languages. For now I will just use some workaround, tricks and or split into multiple regexes to prevent vscode crashes.

pbaksa commented 2 years ago

The problem is I want to edit the regex of a previous search in the search field, and can't guarantee that it will be valid during editing, but VSCode tries to evaluate it when it has some time because I stopped typing for thinking. Edited regex should be re-evaluated only after enter (not the insert line break type of enter, possibly a button will be needed)

Probmkr commented 2 years ago

This bug is still happening in VSCode 1.67.1. Is this bug unresolvable?

Gloix commented 2 years ago

Regex search (and normal search too) can take a considerable amount of time for large files, so it should never block the UI.

Can the regex search be done in a secondary thread so that it never freezes the UI? This might not help in the case that the regex search never finishes, but at least the user can keep editing the regex or copy it and then paste it after restarting the IDE. The other option is to save the regex each time it gets executed so that it can be recovered after restarting the editor if anything fails, but I don't know if this option is too heavy on the secondary memory.

I'm usually happy with the implementation of regex search in the editor, but it has frozen the IDE more than once, and I can't say that I can safely rely on this feature.

sancarn commented 1 year ago

After a discussion with a friend, regexes can also freeze the syntax highlighter!

image

Below shows vscode has a timeout syntax for highlighting

https://github.com/microsoft/vscode/assets/7938900/87dc3004-9b9d-474d-8858-c9b275a42108

Timeout on syntax highlighter is about 1s. The only real solution to this is to use a DFA based implementation of regex. Rust-lang does this. It's possible a rustic implementation could be compiled to wasm and optionally used for regex searches and syntax highlighting as an alternative to base javascript regex. Given DFA will run in linear time, most of these performance concerns will be gone. Only issue is it won't support lookahead and behind, but for a faster experience it might be worth it.

krisutofu commented 1 year ago

I also have this issue with the recent VSCode version.

Version: 1.80.1
Commit: 74f6148eb9ea00507ec113ec51c489d6ffb4b771
Date: 2023-07-12T17:22:25.257Z
Electron: 22.3.14
ElectronBuildId: 21893604
Chromium: 108.0.5359.215
Node.js: 16.17.1
V8: 10.8.168.25-electron.0
OS: Linux x64 5.4.0-153-generic snap

I wanted to search for a JSON key in the last object of a JSON array, this way "...name...(?=([^{]|\n)*\}\])". I first had (?=[^\{]*\}\]) which was not working for no good reason (because it is working in any modern reasonable regex engine). But the attempt with |\n causes VSCode to freeze without timeout.

Obviously, an exponential backtracking problem of a stupid regex engine implementation, caused by the empty alternative in the inner pair of parentheses.

It is time for VSCode to get a reasonable regex engine instead of using JavaScript regex which feels so limiting in many ways. I don't even find a useful extension for better regex. Isn't there any regex engine available with cycle detection?

An intermediate solution could be to make a button to click when the regex pattern is finished so that it doesn't execute unfinished regexes.

EDIT: The dire thing is, the query with (?=[^\{]*\}\]) actually works in the browser (in regex101.com with EcmaScript regex) but not in VSCode and I don't know how I could make VSCode match that pattern at all.

kasperk81 commented 1 week ago

The dire thing is, the query with (?=[^\{]*\}\]) actually works in the browser (in regex101.com with EcmaScript regex) but not in VSCode and I don't know how I could make VSCode match that pattern at all.

multiline require a \n gesture. add [\n]* after [^\{]* -> (?=[^\{]*[\n]*\}\]) https://stackoverflow.com/questions/52647894/multiline-regular-expression-search-in-visual-studio-code

obviously wasn't the brightest idea to mess with javascript/typescript regex support in vscode