Open jekitf opened 3 years ago
Is this the problem?
\extensions\git\src\git.ts
Value if specified in settings should be used
@jekitf I think that utf8 is ok, but the one 2 lines above it might be the issue. But not sure. I won't really be able to look at this this iteration at least, so any testing/PR that you can provide would be greatly appreciated.
Facing a similar issue. Related to #36219?
Edit: Suppose it is relevant that the file encoding differs from the OS. I specify the encoding in settings:
"sas.files.encoding": "windows1252",
This problem also occurs with Shift-JIS files.
Steps to Reproduce:
Setup user settings according to this:
{
"files.encoding": "shiftjis"
}
Initialize a git repository
Create a file which is encoded by Shift-JIS
Open the diff of the created file
Select some lines and run "Stage Selected Ranges"
I'm facing the same problem with Shift-JIS.
Therefore, I tried to verify it.
As eamodio says, this line seems to be the cause of the bug.
child.stdin!.end(data, 'utf8');
\extensions\git\src\git.ts
Current Stable
The Shift-JIS file is converted to UTF-8 by "Stage Selected Ranges".
Verification code result
Shift-JIS files are staged as is.
This verification code is an ad hoc fix and I'm not sure if it's the right way to fix the bug.
Isn't this one supposed to have already been fixed by #84130 (as reported on #55110 --> #36219)? Maybe that fix is the root cause?
https://github.com/microsoft/vscode/pull/84130 has addressed the problem related to encoding when files are being shown in the diff editor. Unfortunately it did not address the encoding related issue when it comes to the "Stage Selected Ranges" command. Fixing the "Stage Selected Ranges" command is currently blocked on https://github.com/microsoft/vscode/issues/824 as at the moment extensions do not have access to the encoding of the text document.
I also run into this today. Why does this issue exist yet?
The viciousness of this bug is that the corrupted files manifest themselves very late in git history, when eg bisecting old commits and discovering differences in irrelevant but huge ranges of text due to EOLs. I've been bitten by this bug, and discovered it roughly after a year!
Is working-tree-encoding
of .gitattributes
a workaround? Using that GIT is expected to store UTF-8 internally, while converting to some different encoding locally. In that case it might be OK if VSCode forwards UTF-8 at some point.
* text=auto
*.bat text eol=crlf working-tree-encoding=cp850
*.c text eol=crlf working-tree-encoding=windows-1252
I tried to create an extension to add a "Stage Selected Range (ANSI)" command for ANSI to work around this problem, but it was not possible because the registerDiffInformationCommand
is still in the proposed stage. (#84899)
We hope this will be corrected as soon as possible.
This issue requires an API for encoding, but the issue of adding this API(#824) has remained open for 6 years. Also, adding such a core API would be difficult for the average contributor. Until the API is added, how about getting the encoding from the config and staging based on that encoding instead of the actual encoding?
It is true that GIT always stores files in UTF-8 encoding on a remote server. With the setting in comments given a year ago the Github Desktop works fine. But the vscode doesn't use the setting when comparing commits. The solution was not very obvious. https://code.visualstudio.com/updates/v1_48#_browser-support https://learn.microsoft.com/en-us/powershell/scripting/dev-cross-plat/vscode/understanding-file-encoding#configuring-vs-code
It is true that GIT always stores files in UTF-8 encoding on a remote server.
Don't be misleading, that's not the case "always", but depends on settings in .gitattributes
. Git itself is fine to store files in arbitrary encoding as-is, depends on various different settings of the client. Remember that it's able to store binary files as well.
Is there an update to this issue? I also opened an issue on VSCodium's GitHub: https://github.com/VSCodium/vscodium/issues/1418
I have reproduced and documented the current problem in two examples. The difference between the two use cases is the use of a .gitattributes. The post has become a little longer than planned.
mkdir .\test-repo
cd test-repo
git init
git config --local --list
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.symlinks=false
core.ignorecase=true
# Test file
Msg "Hello"
# Test file
Msg "Hello"
Msg "Änderungen"
# Test file
Msg "Hello"
Msg "Überlegung"
Msg "Änderungen"
# Test file
Msg "Hello"
Msg "ÄÖ first commit"
Msg "Überlegung"
Msg "ÜÖ later commit"
Msg "Änderungen"
Msg "ÄÖ first commit"
and select "Stage selected Ranges".mkdir .\test-repo2
cd test-repo2
git init
git config --local --list
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.symlinks=false
core.ignorecase=true
# Test file
Msg "Hello"
* text=auto
*.vbs text eol=crlf working-tree-encoding=windows-1252
# Test file
Msg "Hello"
Msg "Änderungen"
# Test file
Msg "Hello"
Msg "Überlegung"
Msg "Änderungen"
# Test file
Msg "Hello"
Msg "ÄÖ first commit"
Msg "Überlegung"
Msg "ÜÖ later commit"
Msg "Änderungen"
Msg "ÄÖ first commit"
and select "Stage selected Ranges".settings.json
"files.autoGuessEncoding": false,
"[vbs]": {
"files.encoding": "windows1252"
}
I had this problem today. Running version 1.85.1. Thought I'd make a comment to give this issue a kick, since this issue is three years old. :)
Can this bug be prioritized higher? Silently changing files is a significant bug. Many users are likely running into this bug without realizing it. If it breaks something later, they might not find the original cause.
Issue Type: Bug
Add the file to a git repo
Git source control: Select the file and "Discard changes" (to restore the file from git)
The file have the content: C3 98 C3 86 C3 85 0D 0A 2D Expected content: D8 C6 C5 0D 0A 2D
The file have been changed from ANSI to UTF-8 in git.
In C++ using "Use Multi-Byte Character Set" this is a fatal bug
VS Code version: Code 1.51.1 (e5a624b788d92b8d34d1392e4c4d9789406efe8f, 2020-11-10T23:34:32.027Z) OS version: Windows_NT x64 10.0.19041
System Info
|Item|Value| |---|---| |CPUs|Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz (8 x 3602)| |GPU Status|2d_canvas: enabledflash_3d: enabled
flash_stage3d: enabled
flash_stage3d_baseline: enabled
gpu_compositing: enabled
multiple_raster_threads: enabled_on
oop_rasterization: disabled_off
opengl: enabled_on
protected_video_decode: unavailable_off
rasterization: enabled
skia_renderer: disabled_off_ok
video_decode: enabled
vulkan: disabled_off
webgl: enabled
webgl2: enabled| |Load (avg)|undefined| |Memory (System)|15.98GB (4.12GB free)| |Process Argv|--crash-reporter-id 48361099-29ee-4112-bdbe-983a488188b4| |Screen Reader|no| |VM|0%|
Extensions (9)
Extension|Author (truncated)|Version ---|---|--- xml|Dot|2.5.1 gc-excelviewer|Gra|3.0.40 csharp|ms-|1.23.6 cpptools|ms-|1.1.3 hexeditor|ms-|1.3.0 powershell|ms-|2020.6.0 vetur|oct|0.30.3 vscode-zipexplorer|sle|0.3.1 pdf|tom|1.1.0