microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
163.16k stars 28.84k forks source link

Find in files: Inconsistent search results when using regex and whole word matching #175794

Open Terrance opened 1 year ago

Terrance commented 1 year ago

Does this issue occur when all extensions are disabled?: Yes

Steps to Reproduce:

As an example, looking for method calls with an argument:

def one():
    target()

def two():
    target(arg=1)

def three():
    target(arg=2)

Initially I did a whole-word regex search for target\([^)], which matched all the calls providing at least one argument (i.e. two and three here).

Spread across multiple files, I found that after selecting a match in the search result list to open one file, and then selecting one in a different file, the first result disappeared from the search results. This is presumably because with whole-word enabled, these shouldn't actually match unless the argument is a single character, which for arg here is not the case.

Testing again now on a different machine, it seems the incorrect matches don't show with multiple candidate files open. If no files are open, candidates from all files are shown. With one file open, candidates from that file are shown but not the others.

Whole-word matching does appear to apply correctly in other cases (e.g. here, target is correctly matched whole-word so that the likes of othertarget don't match).

andreamah commented 1 year ago

Testing again now on a different machine, it seems the incorrect matches don't show with multiple candidate files open. If no files are open, candidates from all files are shown. With one file open, candidates from that file are shown but not the others.

What version of vscode does this other machine use? Did this only start happening today with the update?

Terrance commented 1 year ago

I can check in the morning, but I did try on this computer before updating and it was still happening then, so probably safe to say it wasn't introduced in the latest update.

Terrance commented 1 year ago

The other machine is on 1.75.1, so hasn't gained the update yet.

cben commented 1 year ago

Here is another reproducer.

  1. Create a FooBar.txt with this content:
    Foo
    PrefixFoo
    FooSuffix
    Bar
    PrefixBar
    BarSuffix
  2. Find in files (Ctrl+Shift+F) for Foo|Bar with Match Whole Word & Use Regular Expression options.
  3. If FooBar.txt is open in an editor when search was run, works correctly, apparently equivalent to \b(Foo|Bar)\b regexp: image
  4. However, if FooBar.txt was not open when search is run, it gives unexpected matches, apparently equivalent to \bFoo|Bar\b regexp where Foo is only constrained to start the word, and Bar only to end it: image
  5. Similarly, FooBar.txt was not open when Bar|Foo search is run, it gives different unexpected matches, apparently equivalent to \bBar|Foo\b regexp: image
  6. Opening or closing FooBar.txt doesn't toggle the results between (4) vs (5)/(6) immediately; some other action is needed to re-run search (by changing search string, pressing Enter in search field, or editing any file). That makes sense, because search results are not supposed to depend on files being open :grin:

(In all cases, there were other clean "foo" and "bar" matches in other files; I didn't test the more complex interactions between files that Terrance reports...)

1.74.3 on linux (fedora 37) ``` Version: 1.74.3 Commit: 97dec172d3256f8ca4bfb2143f3f76b503ca0534 Date: 2023-01-09T16:57:40.428Z Electron: 19.1.8 Chromium: 102.0.5005.167 Node.js: 16.14.2 V8: 10.2.154.15-electron.0 OS: Linux x64 6.1.8-200.fc37.x86_64 Sandboxed: No ```
cben commented 1 year ago

This also reproduced on 1.76.0. I now re-tested #112401 (originally reported in 2020) and found that presently its behavior too depends on whether the candidate file has an open editor.

Weirdly, the theory on #112401 suggests the Whole Word constraint being applied after the regexp engine is done searching, resulting in behavior that has no equivalent regexp — whereas my theory of \bFoo|Bar\b vs. \b(Foo|Bar)\b here relies on it translating to a regexp. :man_shrugging:

andreamah commented 1 year ago

Looking more into this, I don't believe this is a new issue. The find when files are open seems to handle matching a whole word differently than how the ripgrep engine does when files are closed.

In-file find uses a more complicated way of using word separators when we want a whole-word match: https://github.com/microsoft/vscode/blob/3059063b805ed0ac10a6d9539e213386bfcfb852/src/vs/editor/common/model/pieceTreeTextBuffer/pieceTreeBase.ts#L799

Whereas our parsing into ripgrep has a simpler approach" https://github.com/microsoft/vscode/blob/5ca7a333e12b313eda4092bf4d1ca60e365ec416/src/vs/base/common/strings.ts#L183-L188

andreamah commented 10 months ago

Might be related to https://github.com/microsoft/vscode/issues/170529 (but here, the whole world toggle over-matches instead)