microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
164.05k stars 29.22k forks source link

Git source control: "Stage Selected Ranges" creates a "Staged Changes" file of type UTF-8 when the orginal file is ANSI (Windows 1252) #111915

Open jekitf opened 3 years ago

jekitf commented 3 years ago

Issue Type: Bug

  1. Create a file of type ANSI (Windows 1252). In Notepoad++ (Encoding-ANSI, Encoding-Character sets-Western European-Windows 1252).
  2. Add the scandinavian letters: ØÆÅ to the file. (hex D8 C6 C5)
  3. Add the file to a git repo

  4. Edit the file and add "\r\n-" (hex 0D 0A 2D)
  5. Git source control: "Stage Selected Ranges" for the one line containing "-" (hex 2D)
  6. Commit the file from "Staged Changes"
  7. Delete the file on disk
  8. Git source control: Select the file and "Discard changes" (to restore the file from git)

    The file have the content: C3 98 C3 86 C3 85 0D 0A 2D Expected content: D8 C6 C5 0D 0A 2D

The file have been changed from ANSI to UTF-8 in git.

In C++ using "Use Multi-Byte Character Set" this is a fatal bug

VS Code version: Code 1.51.1 (e5a624b788d92b8d34d1392e4c4d9789406efe8f, 2020-11-10T23:34:32.027Z) OS version: Windows_NT x64 10.0.19041

System Info |Item|Value| |---|---| |CPUs|Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz (8 x 3602)| |GPU Status|2d_canvas: enabled
flash_3d: enabled
flash_stage3d: enabled
flash_stage3d_baseline: enabled
gpu_compositing: enabled
multiple_raster_threads: enabled_on
oop_rasterization: disabled_off
opengl: enabled_on
protected_video_decode: unavailable_off
rasterization: enabled
skia_renderer: disabled_off_ok
video_decode: enabled
vulkan: disabled_off
webgl: enabled
webgl2: enabled| |Load (avg)|undefined| |Memory (System)|15.98GB (4.12GB free)| |Process Argv|--crash-reporter-id 48361099-29ee-4112-bdbe-983a488188b4| |Screen Reader|no| |VM|0%|
Extensions (9) Extension|Author (truncated)|Version ---|---|--- xml|Dot|2.5.1 gc-excelviewer|Gra|3.0.40 csharp|ms-|1.23.6 cpptools|ms-|1.1.3 hexeditor|ms-|1.3.0 powershell|ms-|2020.6.0 vetur|oct|0.30.3 vscode-zipexplorer|sle|0.3.1 pdf|tom|1.1.0
jekitf commented 3 years ago

Is this the problem?

\extensions\git\src\git.ts

image

Value if specified in settings should be used

image

eamodio commented 3 years ago

@jekitf I think that utf8 is ok, but the one 2 lines above it might be the issue. But not sure. I won't really be able to look at this this iteration at least, so any testing/PR that you can provide would be greatly appreciated.

MadsAdrian commented 3 years ago

Facing a similar issue. Related to #36219?

Edit: Suppose it is relevant that the file encoding differs from the OS. I specify the encoding in settings:

"sas.files.encoding": "windows1252",
hkcomori commented 3 years ago

This problem also occurs with Shift-JIS files.

Steps to Reproduce:

  1. Setup user settings according to this:

    {
    "files.encoding": "shiftjis"
    }
  2. Initialize a git repository

  3. Create a file which is encoded by Shift-JIS

  4. Open the diff of the created file

  5. Select some lines and run "Stage Selected Ranges"

Dub1shu commented 3 years ago

I'm facing the same problem with Shift-JIS.
Therefore, I tried to verify it.
As eamodio says, this line seems to be the cause of the bug.

child.stdin!.end(data, 'utf8');

Verification Code

\extensions\git\src\git.ts verification

Verification Process

  1. Create a non-UTF-8 file (Shift-JIS in this case), add some text and commit it.
  2. Set the workspace encoding setting to Shift-JIS.
  3. Add new line.
  4. In source control view, select the new line and execute "Stage Selected Ranges".

Current Stable
The Shift-JIS file is converted to UTF-8 by "Stage Selected Ranges".
Before

Verification code result
Shift-JIS files are staged as is. After

This verification code is an ad hoc fix and I'm not sure if it's the right way to fix the bug.

ankostis commented 2 years ago

Isn't this one supposed to have already been fixed by #84130 (as reported on #55110 --> #36219)? Maybe that fix is the root cause?

lszomoru commented 2 years ago

https://github.com/microsoft/vscode/pull/84130 has addressed the problem related to encoding when files are being shown in the diff editor. Unfortunately it did not address the encoding related issue when it comes to the "Stage Selected Ranges" command. Fixing the "Stage Selected Ranges" command is currently blocked on https://github.com/microsoft/vscode/issues/824 as at the moment extensions do not have access to the encoding of the text document.

gdh1995 commented 2 years ago

I also run into this today. Why does this issue exist yet?

ankostis commented 2 years ago

The viciousness of this bug is that the corrupted files manifest themselves very late in git history, when eg bisecting old commits and discovering differences in irrelevant but huge ranges of text due to EOLs. I've been bitten by this bug, and discovered it roughly after a year!

ams-tschoening commented 2 years ago

Is working-tree-encoding of .gitattributes a workaround? Using that GIT is expected to store UTF-8 internally, while converting to some different encoding locally. In that case it might be OK if VSCode forwards UTF-8 at some point.

*           text=auto
*.bat       text eol=crlf   working-tree-encoding=cp850
*.c         text eol=crlf   working-tree-encoding=windows-1252
Kiuchi commented 2 years ago

I tried to create an extension to add a "Stage Selected Range (ANSI)" command for ANSI to work around this problem, but it was not possible because the registerDiffInformationCommand is still in the proposed stage. (#84899) We hope this will be corrected as soon as possible.

Dub1shu commented 2 years ago

This issue requires an API for encoding, but the issue of adding this API(#824) has remained open for 6 years. Also, adding such a core API would be difficult for the average contributor. Until the API is added, how about getting the encoding from the config and staging based on that encoding instead of the actual encoding?

h8nor commented 1 year ago

It is true that GIT always stores files in UTF-8 encoding on a remote server. With the setting in comments given a year ago the Github Desktop works fine. But the vscode doesn't use the setting when comparing commits. The solution was not very obvious. https://code.visualstudio.com/updates/v1_48#_browser-support https://learn.microsoft.com/en-us/powershell/scripting/dev-cross-plat/vscode/understanding-file-encoding#configuring-vs-code

ams-tschoening commented 1 year ago

It is true that GIT always stores files in UTF-8 encoding on a remote server.

Don't be misleading, that's not the case "always", but depends on settings in .gitattributes. Git itself is fine to store files in arbitrary encoding as-is, depends on various different settings of the client. Remember that it's able to store binary files as well.

ErikSteiner commented 12 months ago

Is there an update to this issue? I also opened an issue on VSCodium's GitHub: https://github.com/VSCodium/vscodium/issues/1418

I have reproduced and documented the current problem in two examples. The difference between the two use cases is the use of a .gitattributes. The post has become a little longer than planned.

Case 1

  1. mkdir .\test-repo
  2. cd test-repo
  3. git init
  4. git config --local --list
    core.repositoryformatversion=0
    core.filemode=false
    core.bare=false
    core.logallrefupdates=true
    core.symlinks=false
    core.ignorecase=true
  5. Create script.vbs with content
    # Test file
    Msg "Hello"
  6. commit script.vbs in VSCode with Stage Changes
  7. edit the script.vbs and add some umlauts
    # Test file
    Msg "Hello"
    Msg "Änderungen"
  8. commit script.vbs in VSCode with Stage Changes
  9. edit the script.vbs and add some umlauts
    # Test file
    Msg "Hello"
    Msg "Überlegung"
    Msg "Änderungen"
  10. commit script.vbs in VSCode with Stage Changes
  11. edit the script.vbs and add some umlauts
    # Test file
    Msg "Hello"
    Msg "ÄÖ first commit"
    Msg "Überlegung"
    Msg "ÜÖ later commit"
    Msg "Änderungen"
  12. in VSCode under Source Control highlight the row with Msg "ÄÖ first commit" and select "Stage selected Ranges".
  13. Commit the Staged Change
  14. After that:
    • Source Control shows the following state: grafik
    • The explorer shows the following state (just ignore the file name). Note the lines marked in blue.: grafik

Case 2

  1. mkdir .\test-repo2
  2. cd test-repo2
  3. git init
  4. git config --local --list
    core.repositoryformatversion=0
    core.filemode=false
    core.bare=false
    core.logallrefupdates=true
    core.symlinks=false
    core.ignorecase=true
  5. Create script.vbs with content
    # Test file
    Msg "Hello"
  6. Create .gitattributes with content
    *           text=auto
    *.vbs       text eol=crlf   working-tree-encoding=windows-1252
  7. commit .gitattributes in VSCode with Stage Changes
  8. commit script.vbs in VSCode with Stage Changes
  9. edit the script.vbs and add some umlauts
    # Test file
    Msg "Hello"
    Msg "Änderungen"
  10. commit script.vbs in VSCode with Stage Changes
  11. from now on, the explorer shows a change in line three, even if Source Control shows no changes: grafik
  12. edit the script.vbs and add some umlauts
    # Test file
    Msg "Hello"
    Msg "Überlegung"
    Msg "Änderungen"
  13. commit script.vbs in VSCode with Stage Changes
  14. the explorer shows a change in line three, even if Source Control shows no changes. Interestingly the text shows "1 of 1 change": grafik
  15. edit the script.vbs and add some umlauts
    # Test file
    Msg "Hello"
    Msg "ÄÖ first commit"
    Msg "Überlegung"
    Msg "ÜÖ later commit"
    Msg "Änderungen"
  16. in VSCode under Source Control highlight the row with Msg "ÄÖ first commit" and select "Stage selected Ranges".
  17. Commit the Staged Change
  18. After that:
    • Source Control shows the following state: grafik
    • The explorer shows the following state (just ignore the file name). Note the lines marked in blue.: grafik

Details

settings.json

"files.autoGuessEncoding": false,
"[vbs]": {
        "files.encoding": "windows1252"
    }
rasmussehlin commented 10 months ago

I had this problem today. Running version 1.85.1. Thought I'd make a comment to give this issue a kick, since this issue is three years old. :)

michaelmesser commented 2 months ago

Can this bug be prioritized higher? Silently changing files is a significant bug. Many users are likely running into this bug without realizing it. If it breaks something later, they might not find the original cause.