microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
163.99k stars 29.19k forks source link

Search with non-standard encodings not supported #68237

Open mleduque opened 5 years ago

mleduque commented 5 years ago

screenshot_20190208_122607 Issue Type: Bug

Set worskpace encoding to cp437. Do a worskspace search for anything.

The search box is surrounded in red, a popup appears underneath it saying "Unknown encoding: cp437".

I had the same problem once and found I had to unset the option search.useRipgrep to have it working. That worked. But now, I have a warning on this preference that says "deprecated" and to use pcre (which doesn't work).

That's a regression.

VS Code version: Code 1.31.0 (7c66f58312b48ed8ca4e387ebd9ffe9605332caa, 2019-02-06T08:51:24.856Z) OS version: Linux x64 4.15.0-1032-oem

System Info |Item|Value| |---|---| |CPUs|Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (8 x 2874)| |GPU Status|2d_canvas: enabled
checker_imaging: disabled_off
flash_3d: enabled
flash_stage3d: enabled
flash_stage3d_baseline: enabled
gpu_compositing: enabled
multiple_raster_threads: enabled_on
native_gpu_memory_buffers: disabled_software
rasterization: disabled_software
surface_synchronization: enabled_on
video_decode: unavailable_off
webgl: enabled
webgl2: enabled| |Load (avg)|2, 3, 3| |Memory (System)|15.39GB (1.47GB free)| |Process Argv|| |Screen Reader|no| |VM|0%|
Extensions (20) Extension|Author (truncated)|Version ---|---|--- project-manager|ale|10.3.2 quitcontrol-vscode|art|3.0.0 better-toml|bun|0.3.2 whitespace-plus|dav|0.0.5 mustache|daw|1.1.1 gitlens|eam|9.5.0 EditorConfig|Edi|0.12.8 githd|hui|2.1.0 rpm-spec|Lau|0.2.3 vscode-duplicate|mrm|1.2.1 indent-rainbow|ode|7.2.4 vscode-subword-navigation|ow|1.2.0 vscode-docker|Pet|0.5.2 rust|rus|0.5.3 whitespace|san|0.0.5 crates|ser|0.3.6 code-settings-sync|Sha|3.2.4 local-history|xyz|1.7.0 plsql-language|xyz|1.7.0 markdown-all-in-one|yzh|2.0.1
roblourens commented 5 years ago

Yes, searching this encoding is no longer supported. We could default to searching it as utf8 which would work for a-z at least, if that helps.

mleduque commented 5 years ago

Yes I can't change the workspace encoding to utf8 because file contents would be mangled. So only the search should change. But then without a collation (I think that's the term) to flatten for example é, è ê and ë to e (and also capitals), the search would fail in most cases, so I don't know to what extent it would be useful.

JSchiffmacher commented 5 years ago

It is not the solution to say "it is no longer supported". How do we do now, that's the question! The workaround proposed (utf8) is not a solution. We have accented characters in our codes, and the cp437 encoding search is a real need.

roblourens commented 5 years ago

Yes I can't change the workspace encoding to utf8 because file contents would be mangled

I mean that you could convert the actual encodings of the files. Unfortunately I don't have another solution for this right now.

JSchiffmacher commented 5 years ago

If it could help:

When typing a letter and then type ENTER, it displays the results for less than one second: image

And then the message "Unknown encoding : cp437" appears:

image

For information: the search with encoding cp437 worked until last release

roblourens commented 5 years ago

You are seeing some results from files that you have open because there we just search in the open buffer. But it is not able to search the full workspace.

aaaasmile commented 5 years ago

Will be great if this feature will be added. I need it for CP 852. Anyway the vscode is awesome!

DOHere commented 5 years ago

Why is it no longer supported when it's needed very often? How are we expected to go around it? The VS Search function is basically useless atm

bardware commented 5 years ago

I use Windows-1252 encoding in some projects. The files are being displayed and saved correctly. The search shows broken characters every once in a while.

Trucoto commented 5 years ago

I am having this issue as well. Changing the encoding of the source files is not an option because, among other reasons, they contain strings that work with certain legacy devices (serial printers, displays, etc.) that only understand a particular encoding. Sure, I could add translation functions to those cases, convert everything to UTF8 to preserve comments, but that is considerable additional work just to accommodate to a feature that stopped working in the text editor we were using.

Raydir commented 5 years ago

Same here for CP850 - this is really bad, since i have to seek for Regex-Patterns and can't seek for my language specific characters due to this issue.

JSchiffmacher commented 5 years ago

I've just updated my VsCode to "April 2019 (version 1.34)", and the issue is still there... Can we hope a solution for that?

2019-05-17_10-08-31

MaiGuybrush commented 5 years ago

set files.encoding by language is work for me. Just add below setting in .vscode/settings.json. "[c]": { "files.encoding" : "cp950" }, "[cpp]": { "files.encoding" : "cp950" }

Somnium7 commented 5 years ago

@MaiGuybrush That way it works only partially. Git diff support and multi-file searching are broken for files with those encodings then.

JSchiffmacher commented 5 years ago

@Somnium7 I confirm !

The issue is still there

gjans commented 5 years ago

I had the same problem. For me maccenteuro was my non-standard encoding. It is used in specific language files where enconding just can't be changed no matter what.

Solution (in settings.json): 1) I had default/global parameter "files.encoding": "maccenteuro", - changed it back to default "files.encoding": "utf8", 2) I had "search.useRipgrep": false, - commented that out 3) and under [myspecificlanguage] add that "files.encoding": "maccenteuro"

multi-file search works, open file search works, file is still in maccenteuro and special characters didn't break other language file searching (like .js for example) isn't affected

-- Make sure you don't have extra .vscode/settings.json in your source code folder. If you do, then either remove it or apply/merge the changes from global settings.json(user folder) to this .vscode/settings.json file as well. I didn't have anything meaningful in my .vscode/settings.json so i just removed it.

JSchiffmacher commented 5 years ago

Hi @gjans !

It works!

My problem was finally because I have two "settings.json" The first is located in [Drive]:\USERS[User]\AppData\Roaming\Code\User The second is in the folder .vscode located in the root folder of the git directory

The modification given by @gjans must be done in both files!

gjans commented 5 years ago

@JSchiffmacher, happy this worked for you. Good point about folder-specific settings - i've updated my previous message with extra info about that as well.

Trucoto commented 5 years ago

The problem for me it's that I don't have an encoding per language but per project. Some legacy C++98 projects are in CP437 because the platform are related to that encoding (serial printers, segment displays, etc.); new platforms use UTF8 and C++17, for example. So encoding settings per project are mandatory for me.

gjans commented 5 years ago

@Trucoto, if projects can be referenced as folders. Then you should try creating a separate .vscode/settings.json file for each folder(project). Don't use [language] sub-group, just put a single parameter "files.encoding": "CP437". As I understand this should overwrite your default encoding setting, but only inside the scope of a specific folder.

Trucoto commented 5 years ago

@gjans, that's how I have it. It works searching within a file, but not find in files (it retorts "Unknown encoding: cp437"). I tried commenting out "search.useRipgrep": false and "search.useLegacySearch": true but to no avail.

Somnium7 commented 5 years ago

@gjans That does not work entirely. Search only works for string without accented characters. Also git diff incorrectly shows rows with accented characters as changed. Earlier search.useRipgrep": false worked, but not now.

JSchiffmacher commented 5 years ago

@Somnium7 : search with accented characters is partially functional: it does not return all the files containing search string with accented characters. It's still better than before because before that did not work.

image

In the search above, I should have more than 1000 results ... and I only have 4!

Somnium7 commented 5 years ago

People, please put "thumbs up" on this issue, so VS Code team can see it's important!

DOHere commented 5 years ago

I had the same problem. For me maccenteuro was my non-standard encoding. It is used in specific language files where enconding just can't be changed no matter what.

Solution (in settings.json):

  1. I had default/global parameter "files.encoding": "maccenteuro", - changed it back to default "files.encoding": "utf8",
  2. I had "search.useRipgrep": false, - commented that out
  3. and under [myspecificlanguage] add that "files.encoding": "maccenteuro"

multi-file search works, open file search works, file is still in maccenteuro and special characters didn't break other language file searching (like .js for example) isn't affected

-- Make sure you don't have extra .vscode/settings.json in your source code folder. If you do, then either remove it or apply/merge the changes from global settings.json(user folder) to this .vscode/settings.json file as well. I didn't have anything meaningful in my .vscode/settings.json so i just removed it.

I tried that, but it only gives results from the files that I've opened. Is there any other setting to change? I mainly use java and javascript:

{
    "workbench.startupEditor": "newUntitledFile",
    "workbench.colorTheme": "Oceanic Next",
    "explorer.confirmDragAndDrop": false,
    "explorer.confirmDelete": false,
    "window.zoomLevel": 0,
    // "search.useRipgrep": true,
    "files.encoding": "utf8",
    "java.errors.incompleteClasspath.severity": "ignore",
    "terminal.integrated.rendererType": "dom",
    "editor.suggestSelection": "first",
    "vsintellicode.modify.editor.suggestSelection": "automaticallyOverrodeDefaultValue",
    "java.configuration.checkProjectSettingsExclusions": false,
    // "editor.formatOnSave": true,
    // "prettier.trailingComma": "es5"
}
Somnium7 commented 5 years ago

@DOHere You cannot do anything to allow it to work completely (aside from converting all your files to standart encoding, which is not possible in many cases). That's the point of this issue.

DOHere commented 5 years ago

@Somnium7 thanks for letting me know rather than converting them to standard encoding, all of my scripts should have utf8 encoding, so shouldn't it work?

Somnium7 commented 5 years ago

@DOHere I meant UTF8 as standart encoding. When dealing with legacy software (and hardware), it's often not possible to convert your codebase to UTF8 because then it won't work.

bardware commented 5 years ago

Just had it today. I commit files with Windows-1252 encoding to a git repo. Left is version in git, right is local version. I did not touch that line; the left pan just uses the standard encoding UTF-8 when reading the file from history; not the one I set in setting.json which reads

{
    "[cfml]": {
    "files.encoding": "windows1252"
  }
}

grafik

martinussuherman commented 2 years ago

Thanks @MaiGuybrush for your solution. I have to maintain some old code, written in FoxPro for DOS v2.5, and had the same issue. I try to use your solution, so I add this to .vscode/settings.json :

{ "[prg]": { "files.encoding": "cp437" } }

and the global search now works again ... thank you ...

edit: unfortunately, it isn't working properly ... since now some files are detected as UTF-8 and some as CP437 ... so the solution only work partially ...

sjw1980 commented 1 year ago

In my case..

Find your settings.json (C:\Users{your account}\AppData\Roaming\Code\User\settings.json) and delete "files.encoding": "any encoding"

Scorg commented 1 year ago

4 years and no progress. The search should at least respect per language or workspace settings for encoding.

JulioNobre commented 11 months ago

Thanks @MaiGuybrush for your solution. I have to maintain some old code, written in FoxPro for DOS v2.5, and had the same issue. I try to use your solution, so I add this to .vscode/settings.json :

{ "[prg]": { "files.encoding": "cp437" } }

and the global search now works again ... thank you ...

edit: unfortunately, it isn't working properly ... since now some files are detected as UTF-8 and some as CP437 ... so the solution only work partially ...

This shed me a light to a solution. Set Visual Studio Code default encoding to a supported enconding (i.e. utf-8), and set specific encodings (i.e. "cp850") per file extension.

My .vscode/settings.json became the following.

` { "editor.unicodeHighlight.nonBasicASCII": false,
"[foxpro]": { "files.encoding": "cp850" }, "[prg]": { "files.encoding": "cp850" }, "[spr]": { "files.encoding": "cp850" }, "[mpr]": { "files.encoding": "cp850" },
"[txt]": { "files.encoding": "cp850" },
"[csv]": { "files.encoding": "cp850" },
"[tsv]": { "files.encoding": "cp850" }, "editor.codeLens": false, "editor.stickyScroll.enabled": true, "window.zoomLevel": 1, "extensions.autoCheckUpdates": false, "update.mode": "manual", "update.enableWindowsBackgroundUpdates": false, "settingsSync.keybindingsPerPlatform": false, "settingsSync.ignoredExtensions": [

  ]

} `

programmvilli commented 9 months ago

Why has this not been resolved???

GitMensch commented 2 months ago

Wouldn't it be possible to convert the search input by iconv-lite to a byte array (or escaped characters), using this for the search and doing the reverse for displaying the result?

Sadly \x3e does work in regex mode, but \xa3 doesn't - seems that the engine converts the character to the "underlying" character, then searches for that...

GitMensch commented 2 months ago

Note: Depending on the encoding used, you may find a "good enough match" that is supported and which you can use for the global pattern, for example in the case of cp850 you can use "files.encoding": "iso885915" which provides you with both the possibility to search for accented characters and see them correctly in the search output.