Closed Luosiyuan closed 9 months ago
If you need to test the sample compression package, leave a message and I will upload it to the project
@rikyoz
Hi! Thank you for your pull request! I took the liberty of pushing some changes to your original proposal. Apart from some code formatting, I made the normalization use standard algorithms rather than std::regex since this latter has awful performance (e.g., https://quick-bench.com/q/jkVrve2vYOhbLLpvj7jZI6RBcIs - this benchmark uses GCC, but MSVC is not so much different, as far as I know). I also made the "sanitization" happen only on Windows since the character limitations are only on this platform.
As for the remaining issues:
Currently, only the wide character version is available
I think it's best to use only wide characters, which is the native string type on Windows.
The special file name was not processed, only special characters were processed. Special file names include: The file name cannot start or end with a space, nor can it contain one of the following device names: CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9
Yeah, I'm evaluating what is the best approach in the case of forbidden names. Probably, it's better to simply throw an exception if a forbidden name appears in any component of the path of the file.
I'm also evaluating whether to add an option to enable/disable the path "sanitization".
I strongly agree with your approach.
As for the file name issue: I have seen some software solutions for special file names by adding the "_" character before their file names. Processing file names is okay. However, if the directory contains special characters (which just split the path) such as C:\abc\NUL\def, it will be very difficult to handle. In short, these compressed packages cannot be generated on the Windows platform, but there is always a strange thing: compressed packages generated from other platforms are decompressed under the window.
There is another issue, as the special string issue was not fixed from the bottom layer of 7z, if string normalization was performed during decompression, it would result in different file names returned by the callback.
Hi!
As for the file name issue: I have seen some software solutions for special file names by adding the "_" character before their file names. Processing file names is okay. However, if the directory contains special characters (which just split the path) such as C:\abc\NUL\def, it will be very difficult to handle. In short, these compressed packages cannot be generated on the Windows platform, but there is always a strange thing: compressed packages generated from other platforms are decompressed under the window.
Between yesterday and today, I pushed some commits that improve the path sanitization:
_
character as you suggested (I've seen similar implementations too);
Test\\COM0\\hello?world<.txt
becomes Test\\_COM0\\hello_world_.txt
.BIT7Z_PATH_SANITIZATION
to ON
(available only on Windows).There is another issue, as the special string issue was not fixed from the bottom layer of 7z, if string normalization was performed during decompression, it would result in different file names returned by the callback.
If I understand it correctly, you mean that the FileCallback should report the original "unsanitized" path, right? If this is the case, I also fixed this.
As a side note, I moved all the sanitization functions to fsutil.hpp/fsutil.cpp.
Just so you know, I've pushed some commits that should fix and improve the path sanitization; I also added some unit tests. Please let me know if I missed anything and if everything is okay for your use case so I can merge the pull request.
After using the code optimized by the author, it is currently functioning normally without any issues. thanks
Perfect, you're welcome!
In fact, I was not detailed enough to examine, but just discovered a small problem, and the special text of the window was incompatible with ‘[’ ‘]’
Description
Fix a bug where the compressed package contains special characters under the window as directory or file names, resulting in file creation failure. The location where the bug appears: Fs:: create for bit7z src internal fileextraccallback.cpp file The directories function. Repair plan: Using regular expressions to replace special characters with the "" character supported by window can solve the problem. std::wstring CharacterStandard(const std::wstring& src) { std::wstring destChar = src; //Define Rules std::wregex illegalCharRegex(L"[<>:\"/|?*]"); //Replacing illegal characters with underscores using regular expressions destChar = std::regexreplace(destChar, illegalCharRegex, L""); return destChar.c_str(); } Remaining issues:
Motivation and Context
Some users' compressed packages contain special characters as the file name for the compressed content, which causes an exception to be thrown when creating the file and prevents normal decompression.
How Has This Been Tested?
You can generate compressed packages containing the following window special characters as directory or file names on the Mac platform for decompression. <(less than sign)
Types of changes
Checklist: