rikyoz / bit7z

A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
https://rikyoz.github.io/bit7z
Mozilla Public License 2.0
602 stars 110 forks source link

Fix special characters causing file creation failure #168

Closed Luosiyuan closed 9 months ago

Luosiyuan commented 9 months ago

Description

Fix a bug where the compressed package contains special characters under the window as directory or file names, resulting in file creation failure. The location where the bug appears: Fs:: create for bit7z src internal fileextraccallback.cpp file The directories function. Repair plan: Using regular expressions to replace special characters with the "" character supported by window can solve the problem. std::wstring CharacterStandard(const std::wstring& src) { std::wstring destChar = src; //Define Rules std::wregex illegalCharRegex(L"[<>:\"/|?*]"); //Replacing illegal characters with underscores using regular expressions destChar = std::regexreplace(destChar, illegalCharRegex, L""); return destChar.c_str(); } Remaining issues:

  1. Currently, only the wide character version is available
  2. The special file name was not processed, only special characters were processed. Special file names include: The file name cannot start or end with a space, nor can it contain one of the following device names: CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9

Motivation and Context

Some users' compressed packages contain special characters as the file name for the compressed content, which causes an exception to be thrown when creating the file and prevents normal decompression.

How Has This Been Tested?

You can generate compressed packages containing the following window special characters as directory or file names on the Mac platform for decompression. <(less than sign)

(greater than sign) : (colon) "(Double quotes) /(forward slash) (backslash) |(Vertical line) ? (Question mark) *(asterisk)

Types of changes

Checklist:

Luosiyuan commented 9 months ago

If you need to test the sample compression package, leave a message and I will upload it to the project

Luosiyuan commented 9 months ago

@rikyoz

rikyoz commented 9 months ago

Hi! Thank you for your pull request! I took the liberty of pushing some changes to your original proposal. Apart from some code formatting, I made the normalization use standard algorithms rather than std::regex since this latter has awful performance (e.g., https://quick-bench.com/q/jkVrve2vYOhbLLpvj7jZI6RBcIs - this benchmark uses GCC, but MSVC is not so much different, as far as I know). I also made the "sanitization" happen only on Windows since the character limitations are only on this platform.

As for the remaining issues:

Currently, only the wide character version is available

I think it's best to use only wide characters, which is the native string type on Windows.

The special file name was not processed, only special characters were processed. Special file names include: The file name cannot start or end with a space, nor can it contain one of the following device names: CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9

Yeah, I'm evaluating what is the best approach in the case of forbidden names. Probably, it's better to simply throw an exception if a forbidden name appears in any component of the path of the file.

I'm also evaluating whether to add an option to enable/disable the path "sanitization".

Luosiyuan commented 9 months ago

I strongly agree with your approach.

As for the file name issue: I have seen some software solutions for special file names by adding the "_" character before their file names. Processing file names is okay. However, if the directory contains special characters (which just split the path) such as C:\abc\NUL\def, it will be very difficult to handle. In short, these compressed packages cannot be generated on the Windows platform, but there is always a strange thing: compressed packages generated from other platforms are decompressed under the window.

Luosiyuan commented 9 months ago

There is another issue, as the special string issue was not fixed from the bottom layer of 7z, if string normalization was performed during decompression, it would result in different file names returned by the callback.

rikyoz commented 9 months ago

Hi!

As for the file name issue: I have seen some software solutions for special file names by adding the "_" character before their file names. Processing file names is okay. However, if the directory contains special characters (which just split the path) such as C:\abc\NUL\def, it will be very difficult to handle. In short, these compressed packages cannot be generated on the Windows platform, but there is always a strange thing: compressed packages generated from other platforms are decompressed under the window.

Between yesterday and today, I pushed some commits that improve the path sanitization:

There is another issue, as the special string issue was not fixed from the bottom layer of 7z, if string normalization was performed during decompression, it would result in different file names returned by the callback.

If I understand it correctly, you mean that the FileCallback should report the original "unsanitized" path, right? If this is the case, I also fixed this.

As a side note, I moved all the sanitization functions to fsutil.hpp/fsutil.cpp.

rikyoz commented 9 months ago

Just so you know, I've pushed some commits that should fix and improve the path sanitization; I also added some unit tests. Please let me know if I missed anything and if everything is okay for your use case so I can merge the pull request.

Luosiyuan commented 9 months ago

After using the code optimized by the author, it is currently functioning normally without any issues. thanks

rikyoz commented 9 months ago

Perfect, you're welcome!

Luosiyuan commented 9 months ago

In fact, I was not detailed enough to examine, but just discovered a small problem, and the special text of the window was incompatible with ‘[’ ‘]’ 企业微信截图_16967360357176 image

rikyoz commented 9 months ago

That is not the final code merged into the master branch, as this was a bug that I fixed in the PR with this commit.