icsharpcode / SharpZipLib

#ziplib is a Zip, GZip, Tar and BZip2 library written entirely in C# for the .NET platform.
http://icsharpcode.github.io/SharpZipLib/
MIT License
3.73k stars 976 forks source link

Files and folders with non-english characters names are have gibberish names after unpacking #777

Closed TuTAH1 closed 2 years ago

TuTAH1 commented 2 years ago

Steps to reproduce

  1. Pack "слово.txt" by any archivator, ex. Bandizip
  2. See that any programm unpacks it correctly (ex. windows explorer or same program you packed it)
  3. Unpack it via `new FastZip().ExtractZip()1

Expected behavior

files and folders have same name as when they was packed

Actual behavior

files and folders name is like ����� �ணࠬ��

Version of SharpZipLib

1.4.0

Obtained from (only keep the relevant lines)

Tryed actions:

ZipStrings.CodePage = 866;    // No data is available for encoding 866. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
ZipStrings.CodePage = 1251;   // same error
ZipStrings.CodePage = 65001;  // gibberish
ZipStrings.UseUnicode = true; // gibberish

new FastZip { 
  EntryFactory = new ZipEntryFactory { 
    IsUnicodeText = true 
  } 
}.ExtractZip(filepath, (TempFolder? "Temp\\" : ""), null);

(instead of just

new FastZip ().ExtractZip(filepath, (TempFolder? "Temp\\" : ""), null);
//not works. 

IsUnicodeText = true gives same result as IsUnicodeText = false

piksel commented 2 years ago

IsUnicodeText = true gives same result as IsUnicodeText = false

This is because it's only used for creating entries, not when reading.

No data is available for encoding 866. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.

This is because .NET only includes a very limited set of supported encodings. To add support for all the encodings present in .NET Framework, call this:


using System.Text;

// ...

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

In fact, if that is called, FastZip should automatically pick that encoding (as your OS is set to it).

piksel commented 2 years ago

Actually, I am going to reopen this, because the automatic encoding only works if

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

is called before any instance of ZipCodec has been accessed, and only on .NET FW. For .NET Core / 5+ it still only returns UTF-8. This should be fixed in an upcoming release.