rikyoz / bit7z

A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
https://rikyoz.github.io/bit7z
Mozilla Public License 2.0
611 stars 112 forks source link

[Feature Request]: Can I use UTF-8 in compression? #91

Closed wihn2021 closed 9 months ago

wihn2021 commented 2 years ago

Feature description

Is there a method that I can compress a directory using UTF-8? I have checked head files, but didn't find a method to change the encoding.

Additional context

No response

Code of Conduct

rikyoz commented 2 years ago

Hi!

Is there a method that I can compress a directory using UTF-8? I have checked head files, but didn't find a method to change the encoding.

Do you mean the encoding of the strings passed to bit7z methods? In this case, there's no way in the current stable version.

Natively, Windows uses the UTF-16 encoding for Unicode strings, so you usually need to use std::wstring or wchar_t C strings. This is the approach of 7-zip, and hence of bit7z.

This choice will change from the next major version of bit7z (v4.0). Since it will become a cross-platform library, by default, it will follow the so-called UTF-8 Everywhere manifesto. Hence, all string parameters will be considered as UTF-8 encoded std::strings. This behavior will be customizable, allowing for example to still use std::wstrings on Windows, and std::strings elsewhere.

wihn2021 commented 2 years ago

I mean the directory names and filenames in zip archive. I want them to be encoded in UTF-8, so that I can use the archive on Android platform. If I use 7z.exe, I would add the argument 'cu=on' to get a zip archive with filenames encoded in UTF-8, so is there a way to use bit7z functions to get the same result as 7z argument 'cu=on'?

rikyoz commented 2 years ago

Uhm, I see! I actually didn't know this specific option for the zip archives. So no, I'm sorry, there's no way in bit7z since it doesn't specify the cu parameter when calling the 7-zip API. According to the help documentation of 7-zip: By default (if cl and cu switches are not specified), 7-Zip uses UTF-8 encoding only for file names that contain symbols unsupported by local code page. I don't know if this can help you. I'll need to evaluate how to add this feature to the library and whether to do it on the current stable version or add it to the next version.

wihn2021 commented 2 years ago

Thank you for your reply!

rikyoz commented 9 months ago

Implemented in v4.0.0.

Now, you can use 7-zip format settings like cu using the setFormatProperty method. E.g.:

BitFileCompressor compressor{ lib, BitFormat::Zip };
compressor.setFormatProperty( L"cu", true );
compressor.compressDirectory( "dir/path/", "dir_archive.zip" );

Please note that the method only accepts wide-character literals as the first argument.