rikyoz / bit7z

A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
https://rikyoz.github.io/bit7z
Mozilla Public License 2.0
623 stars 113 forks source link

[Feature Request]: When using the same compression parameters as 7-zip, the compression process can be 10% - 20% slower. #224

Open yuzhengjun1204 opened 1 month ago

yuzhengjun1204 commented 1 month ago

Feature description

For a 10GB folder, using 7zFM.exe for compression takes about 80 seconds, while the test code takes an average of around 90 seconds. The goal is to achieve the same compression speed as 7-zip. 7-zip: 24.05 OS: Win10, MSVC2019, x86. flags: BIT7Z_AUTO_FORMAT、BIT7Z_REGEX_MATCHING、BIT7Z_AUTO_PREFIX_LONG_PATHS、BIT7Z_PATH_SANITIZATION

try { // bit7z classes can throw BitException objects
        using namespace bit7z;
        Bit7zLibrary lib{ "7z.dll" };
        BitFileCompressor compressor{ lib, BitFormat::Zip };
        compressor.setProgressCallback([&](uint64_t in_size) -> bool {
            //std::cout << "setProgressCallback: " << in_size << std::endl;
            return true;
            });
        compressor.setFileCallback([&](std::string file_path) {
            std::cout << "filecallback: " << file_path << std::endl;
            });
        std::vector<std::string> file_list;
        file_list.push_back("D:\\testzip\\66");
        compressor.setOverwriteMode(OverwriteMode::Overwrite);
        compressor.setDictionarySize(32 * 1024);
        compressor.setWordSize(32);
        compressor.setThreadsCount(12);
        compressor.setCompressionMethod(BitCompressionMethod::Deflate);
        compressor.compress(file_list, "d:\\66.zip");
    }
    catch (const bit7z::BitException& ex) {
        std::cout << "error:1" << ex.code() << ex.what();
    }

Additional context

No response

Code of Conduct

rikyoz commented 1 month ago

Hi! Sorry for the late reply.

I've been doing some benchmarking and profiling with a similar use case and have indeed noticed some bit7z overhead. My current understanding is that the overhead is mainly due to the fact that bit7z internally uses the standard C++ file streams for reading/writing files, whereas 7-Zip uses the Windows APIs low-level functions. C++'s std::fstreams are known to be particularly slow on MSVC (https://stackoverflow.com/questions/26095160/why-are-stdfstreams-so-slow); bit7z implements some workarounds to mitigate the problem, but it's possible that this may not be enough in some use cases, so I'm still investigating the issue. I'm also looking into getting bit7z to use more low-level functions for the IO, but any transition will take some time.

Having said that, there are a few factors that can increase bit7z's overhead over 7-Zip's compression:

All of these factors can add overhead to your application, but the actual overhead will likely depend on your use case, and in many cases it will be negligible.

rikyoz commented 1 month ago

Hi! Just an update on the issue.

I have done some extensive benchmarking and profiling. I found that disabling std::fstream's buffering seems to improve bit7z's performance a bit. In my tests, bit7z was on average 7% faster than before, and the improvement was fairly consistent on each run; however, the actual performance improvement will likely depend on each use case. I've pushed a commit to the hotfix/v4.0.8 branch with the fix, in case you want to test it.

Despite this improvement, bit7z is still slower than 7-Zip, and my hunch that std::fstream was the cause seems to be confirmed. I've reimplemented bit7z file streams using low-level IO functions, and in my tests it ran as fast as 7-Zip, at least on Windows. This implementation still needs some polishing and extensive testing, so I won't include it in the next v4.0.8, but more likely in the next v4.1 beta.

yuzhengjun1204 commented 1 month ago

Hi! Just an update on the issue.

I have done some extensive benchmarking and profiling. I found that disabling std::fstream's buffering seems to improve bit7z's performance a bit. In my tests, bit7z was on average 7% faster than before, and the improvement was fairly consistent on each run; however, the actual performance improvement will likely depend on each use case. I've pushed a commit to the hotfix/v4.0.8 branch with the fix, in case you want to test it.

Despite this improvement, bit7z is still slower than 7-Zip, and my hunch that std::fstream was the cause seems to be confirmed. I've reimplemented bit7z file streams using low-level IO functions, and in my tests it ran as fast as 7-Zip, at least on Windows. This implementation still needs some polishing and extensive testing, so I won't include it in the next v4.0.8, but more likely in the next v4.1 beta.

Thank you for your reply. Following the v4.0.8 code, I have also achieved an approximate 10% increase in speed. However, it is still somewhat inferior compared to 7zip. I look forward to the improved performance in the future.

yuzhengjun1204 commented 1 month ago

I monitored the reading details using procmon.exe on Windows and found that this might be the reason for the slow compression speed of bit7z. Can you upload the test code? I would also like to give it a try.

企业微信截图_17225849231023 企业微信截图_17225850632568
rikyoz commented 1 month ago

I have just pushed a commit (27abf0c49151e61e03ed3d98226005fb3e511cc2) with the re-implementation of the file streams using low-level IO functions (feature/lowlevel-io branch). This branch is based on develop, not master (like the hotfix/v4.0.8). Also, as I said, the implementation still needs some polishing, and there may still be some bugs as I'm still improving and testing it.

Anyway, thanks for all the tests you're performing! :pray:

yuzhengjun1204 commented 1 month ago

Hi! I modified my local code according to the commit 27abf0c, and after testing, I found it is still far behind 7-zip. Using bit7z to compress takes 50s, while 7-zip only takes 40s. I would like to know what the possible reasons might be and whether there are any other changes that need to be made.

rikyoz commented 1 month ago

Hi! This is strange, as in all my tests the low-level implementation basically runs as fast as 7-Zip:

image

Actually, I've seen bit7z performing faster (in the tests that produced the graph above, it was 1s faster on average; in other tests, even 10s seconds faster).

Also, I've noticed that you are building for x86 instead of x86_64. Are you also testing the x86 version of 7-zip? In my tests, I've always used x86_64, so I'll need to check if x86 bit7z behaves differently.

I would like to know what the possible reasons might be and whether there are any other changes that need to be made.

Even with the low-level optimizations, the same considerations I reported in my first comment about BIT7Z_USE_NATIVE_STRING, std::cout, antiviruses, and so on still apply. For example, in my tests I have to either disable my antivirus, or put my test application to the antivirus' exclusion list.

Also, the low-level implementation I wrote is not actually the lowest level achievable, as it uses Microsoft's implementation of POSIX functions like open, write, read, etc., and not the native Win32 APIs (e.g., WriteFile). It is possible that this might be the cause of the overhead you're experiencing, so I'll try to test that.

yuzhengjun1204 commented 1 month ago

Thank you for your reply! Yes,I am using the x86. I am also using the x86 version of 7-zip.

yuzhengjun1204 commented 1 month ago

It might be due to the thread count settings. After setting the thread count to be the same as 7-zip, the test results here show that it is 2 seconds slower than 7-zip. But in a QT interface program, using the same code, it is about 8 seconds slower. Have you encountered this situation? I have already removed the callback for testing.

rikyoz commented 4 weeks ago

Hi! Just another update. I tried out some optimizations, in particular a mix of low-level IO and native IO functions (basically I use the former for opening/closing the files, while directly using the native Win32 APIs for reading/writing/searching the files). After some benchmarking, it seems to have improved performance as expected, bringing bit7z closer to 7-zip:

image

There's still some overhead, though. I guess this is due to the remaining non-native functions, so I'll need to test a native-only implementation. It'll probably take some time, as benchmarking is quite slow.

Yes,I am using the x86. I am also using the x86 version of 7-zip.

Thanks for the further details.

It might be due to the thread count settings. After setting the thread count to be the same as 7-zip, the test results here show that it is 2 seconds slower than 7-zip.

I see.

But in a QT interface program, using the same code, it is about 8 seconds slower. Have you encountered this situation? I have already removed the callback for testing.

This is new and quite strange, given that you also removed the callback.

Edit: just to be clear, you're not using any callback in the QT program, right?

yuzhengjun1204 commented 4 weeks ago

Edit: just to be clear, you're not using any callback in the QT program, right?

I added code in the QT process to prevent DLL injection, which caused the compression time to be longer. After removing it, the compression time is normal. All my problems have now been resolved. Thank you very much.