Closed lz990377023 closed 4 months ago
Hi!
is there a problem with the string decoding Settings
As far as I know, there shouldn't be any problem.
7-Zip always uses wide strings internally, and that bstrValue
is exactly the wide string that 7-Zip reports to bit7z, without any modification.
On Linux and macOS, bit7z then needs to convert/decode these wide strings into narrow strings; to do this, the library uses the C++ standard way of narrow string conversion, i.e:
std::wstring_convert< std::codecvt_utf8< wchar_t >, wchar_t > converter;
return converter.to_bytes( wideString, wideString + size ); // std::wstring to std::string
So I don't think that the problem is on bit7z's side, but I'll try to investigate it anyway! By the way, what is the expected string value in those screenshots?
Thanks for your answer, I have tried to convert wstring to string, but it is still garbled The correct text is the picture below
I have tried to convert wstring to string, but it is still garbled
Sorry, I meant that bit7z already performs the conversion, so there's no need to do a string -> wstring -> string conversion.
Anyway, I did some tests, and I was able to replicate the issue. It seems to happen only on Zip archives created using the macOS's native compress tool (the one from the right-click context menu).
If however the Zip archive is created via 7-Zip's CLI 7zz
, the name of the item is correctly decoded:
As I said, I don't think this is a bug on bit7z side: it is the 7z.so
that reports the item name differently in these cases, for some reasons.
However, the 7-Zip's CLI seems to perform some further string decoding with respect to the shared library, as the 7zz
tool always displays the correct name:
I'll investigate what decoding is performed by 7zz
and try to implement it also in bit7z.
Thanks again for your reply, I will also try to see the processing of 7zz and look forward to updating the string decoder to bit7z
So, I investigated the problem further, and I found that this is a known issue with Zip archives created with the compression tool of macOS:
https://github.com/weichsel/ZIPFoundation/issues/63 https://github.com/gildas-lormeau/zip.js/issues/131 https://stackoverflow.com/questions/13261347/correctly-decoding-zip-entry-file-names-cp437-utf-8-or
In short, the macOS zip utility uses UTF-8 for filenames, but it doesn't set the UTF-8 bit flag in the Zip file.
I've found a possible fix to make 7-Zip correctly handle such Zip archives: use an UTF-8 locale.
// Before calling .path() or .name() on the item object
std::locale::global(std::locale("en_US.UTF-8"));
My guess is that since 7-Zip does not read the UTF-8 flag, it interprets the filenames using the current locale's encoding, which may not be UTF-8.
Since 7-Zip uses wide strings internally, and these are usually UTF-32 encoded on macOS/Linux, 7-Zip does a conversion from the locale's encoding (possibly not UTF-8) to UTF-32, causing the garbled characters since the original encoding was actually UTF-8.
The 7zz
tool solves this by setting the locale to en_US.UTF-8
, without any special string decoding as I originally thought: it simply converts from UTF-8 to UTF-32.
Unfortunately, I don't think there is a clean workaround that can be implemented within bit7z.
bit7z version
4.0.x
Compilation options
BIT7Z_7ZIP_VERSION
7-zip version
v23.01
7-zip shared library used
7z.dll / 7z.so
Compilers
Clang
Compiler versions
No response
Architecture
x86_64
Operating system
macOS
Operating system versions
No response
Bug description
Hello, I used mac to decompress zip and found some Chinese garbled characters in the package, I don't know what the cause is, is there a problem with the string decoding Settings
Steps to reproduce
No response
Expected behavior
No response
Relevant compilation output
No response
Code of Conduct