microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
164.98k stars 29.53k forks source link

Find Regex for only non-basic Latin characters matches "S" and "s" #141057

Closed artsyhugh closed 1 year ago

artsyhugh commented 2 years ago

Does this issue occur when all extensions are disabled?: Yes

Issue Type: Bug

Find regex for non-Basic Latin characters matches "S" and "s" alone. I tested on regex101.com without problems. The regex is supposed to match characters with diactrical marks, and for some reason in VS Code it matches the letters "S" and "s" without matching "P" or "a" for example.

VS Code version: Code 1.63.2 (899d46d82c4c95423fb7e10e68eba52050e30ba3, 2021-12-15T09:40:02.816Z) OS version: Windows_NT x64 10.0.19044 Restricted Mode: No

System Info |Item|Value| |---|---| |CPUs|Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz (4 x 2592)| |GPU Status|2d_canvas: enabled
gpu_compositing: enabled
multiple_raster_threads: enabled_on
oop_rasterization: enabled
opengl: enabled_on
rasterization: enabled
skia_renderer: enabled_on
video_decode: enabled
vulkan: disabled_off
webgl: enabled
webgl2: enabled| |Load (avg)|undefined| |Memory (System)|15.89GB (9.24GB free)| |Process Argv|--crash-reporter-id ce2503b0-8560-41ba-9434-424948b2a74a| |Screen Reader|no| |VM|0%|
Extensions: none
A/B Experiments ``` vsliv368cf:30146710 vsreu685:30147344 python383:30185418 vspor879:30202332 vspor708:30202333 vspor363:30204092 pythonvspyl392cf:30425750 pythontb:30283811 pythonvspyt551:30345470 pythonptprofiler:30281270 vshan820:30294714 vstes263:30335439 pythondataviewer:30285071 vscod805cf:30301675 pythonvspyt200:30340761 binariesv615:30325510 bridge0708:30335490 bridge0723:30353136 vsaa593:30376534 vsc1dsc:30424895 pythonvs932:30410667 vscop804:30404766 vscop453:30404998 vsrem710cf:30416617 vsbas813:30426126 ```

Steps to Reproduce:

1.Paste this text into a new file 【ADAPSO】アダプソ 【Neuquén】ネウケン

  1. Find with the regex : [À-ÿĀ-ſƀ-ɏɐ-ʯḀ-ỿⱠ-ⱿꜢ-ꟿꬰ-ꭤff-st]
  2. It should only match é in Neuquén, but it also matches S in ADAPSO
  3. Do the same test on regex101.com or Notepad++ without problems
ArturoDent commented 2 years ago

I'll just note that enabling the Match Case option results in matching only the é in Neuquén. But I don't know why that makes a difference but the info could help to diagnose the problem.

artsyhugh commented 2 years ago

I'll just note that enabling the Match Case option results in matching only the é in Neuquén. But I don't know why that makes a difference but the info could help to diagnose the problem.

é is merely an example here. I'm working on text with lots of uppercase letters too. What strange about this problem is that the regex only matches the letters "S" and "s" and no other basic Latin letters, which is why I think this must be a bug.

RedCMD commented 2 years ago

S and s are singly being matched by ſ: LATIN SMALL LETTER LONG S https://apps.timwhitlock.info/unicode/inspect?s=%C5%BF

artsyhugh commented 2 years ago

S and s are singly being matched by ſ: LATIN SMALL LETTER LONG S https://apps.timwhitlock.info/unicode/inspect?s=%C5%BF

So is it a bug or a feature? Notepad++ is very loose with its regex and even it doesn't consider "long s" and regular s the same. This seems to be a Unicode canonical equivalence thing and it's a huge pitfall in my opinion. It's fine when I search with web browsers but it's not fine at all for accurate searches in an advanced text editor.

vscodenpa commented 1 year ago

This feature request is now a candidate for our backlog. The community has 60 days to upvote the issue. If it receives 20 upvotes we will move it to our backlog. If not, we will close it. To learn more about how we handle feature requests, please see our documentation.

Happy Coding!

vscodenpa commented 1 year ago

This feature request has not yet received the 20 community upvotes it takes to make to our backlog. 10 days to go. To learn more about how we handle feature requests, please see our documentation.

Happy Coding!

vscodenpa commented 1 year ago

:slightly_frowning_face: In the last 60 days, this feature request has received less than 20 community upvotes and we closed it. Still a big Thank You to you for taking the time to create this issue! To learn more about how we handle feature requests, please see our documentation.

Happy Coding!