stachenov / quazip

Qt/C++ wrapper over minizip
Other
580 stars 233 forks source link

New compression functions for JlCompress. #193

Closed lazyone233 closed 1 month ago

lazyone233 commented 8 months ago

https://github.com/stachenov/quazip/issues/192

New compression functions have been added, which allow users to set compression strategy.

All compression functions have been overloaded, and each function has been equipped with an additional enum variable named "strategy" of the type CompressionStrategy. Through this variable, six different compression strategies can be set:

Storage - only packs data without compression Fastest - the fastest compression speed but with low compression ratio Faster - a relatively fast compression speed but with a not-so-high compression ratio Standard - a moderate compression speed and compression ratio Better - a higher compression ratio but with a not-so-fast compression speed Best - the highest compression ratio but with a slow compression speed

lazyone233 commented 4 months ago

@cen1 I have made the changes to the code. You can check it now

cen1 commented 4 months ago

I think adding a default parameter CompressionStrategy strategy = Standard breaks the ABI guidelines so just to be on the safe side, keep two separate method signatures as in the previous version and construct CompressionStrategy inside the body of the old method.

lazyone233 commented 4 months ago

I think adding a default parameter CompressionStrategy strategy = Standard breaks the ABI guidelines so just to be on the safe side, keep two separate method signatures as in the previous version and construct CompressionStrategy inside the body of the old method.

I'll try modifying the code again.

lazyone233 commented 4 months ago

@cen1 Based on my understanding of your comments, I have made revisions, and you can review them.

cen1 commented 4 months ago

Did some work adjacent to this, if I fix file timestamps when compressing I can get consistent hashes for final archives. This should allow unit tests to verify that default compression level produces the same result as code before this MR.

Will take a bit longer due to my schedule being sporadic but getting there.

lazyone233 commented 4 months ago

Did some work adjacent to this, if I fix file timestamps when compressing I can get consistent hashes for final archives. This should allow unit tests to verify that default compression level produces the same result as code before this MR.

Will take a bit longer due to my schedule being sporadic but getting there.

Got it. Please inform me once you have completed your tasks, and I will attempt to implement the necessary adjustments accordingly.

cen1 commented 1 month ago

@lazyone233 sorry for leaving you hanging for a bit, my work schedule has been sporadic as of late :)

I have extended all compression methods with an Options class. Initially only contains a timestamp field so you can set fixed timestamp to zipped files but you can now rework your patch and put the compression level into these Options.

I think this will be more sane as more features could be tacked on the JlCompress (maybe encryption?) and method overloading would go out of hand.

Also added two new *Options unit tests which can now create reproducible archives (final result is exactly the same on each run). You can extend those two tests after adding compression level and the default level should produce exact same content as the current code.

The only thing to watch is whether compression level will even do anything to the current unit test files which are quite small content wise, if each compression level produces the same output we might need to add some bigger files.

If you lack time currently I am also happy to integrate your code as my next task.

Will merge this soon but can wait for any comments you might have: https://github.com/stachenov/quazip/pull/203

lazyone233 commented 1 month ago

@cen1 Sorry for seeing your reply late, I'm glad to adjust my code to fit your proposal. However, I'm not very familiar with the overall process of pull requests. Should I wait for your code to be merged into the existing master branch before making corresponding code modifications?

cen1 commented 1 month ago

Indeed, wait for the merge which should be done soon (I am checking why the windows tests are failing atm).

lazyone233 commented 1 month ago

Okay, please inform me once the code merge is completed, I'll take a look at how to modify the code in the meantime

cen1 commented 1 month ago

It's in.

lazyone233 commented 1 month ago

@cen1 I have completed the corresponding code modifications, but there are some issues that may require your help to resolve:

  1. When the total file size is too small, the sha256sum values of the compressed files obtained using the Faster, Standard, and Better compression strategies are the same. Only when the file size is large enough do differences appear. Therefore, it is necessary to add test code related to large files.
  2. In the simple test of this testing method compressFileOptions_data, the sha256sum_unixvalue obtained differs from the one you provided, being b197b734220f78d9e253a88a03503325724bca0ab5d575913c6df31602450ef3. The testing environment is (Config: Using QtTest library 5.15.2, Qt 5.15.2 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 5.3.1 20160406 (Red Hat 5.3.1-6)), Ubuntu 22.04).
cen1 commented 1 month ago

Weird, the hash I hardcoded seems to have passed all CI tests which use Ubuntu 22.04 and different Qt versions.

Please attach your jlsimplefile.zip here so I can try to binary diff it. It could be something environment specific.

The rest of the code seems ok, I will do some small cleanups and merge it after we figure out why the difference.

lazyone233 commented 1 month ago

jlsimplefile.zip, sha256sum_unix value is b197b734220f78d9e253a88a03503325724bca0ab5d575913c6df31602450ef3 jldir.zip, sha256sum_unix value is 9035d8e192fc6a850921202299f16f175f5f229dc6c485d0527c88a5f15c0352

cen1 commented 1 month ago

It was because of file permissions, fixed in split branch.

I have question about "Standard": According to zlib doc, Default is lvl 6, while Standard as is now is 5. I think this could be confusing to end user, to me Standard==Default. I would either bump Standard to lvl 6 or get rid of it.

lazyone233 commented 1 month ago

Choose the first one. Should I make the change, or do you want to modify it and then merge?

lazyone233 commented 1 month ago

The hash value for this test section is still incorrect. jldir.zip

1: FAIL!  : TestJlCompress::compressDirOptions(simple) Compared values are not the same
1:    Actual   (hash)          : "c7f73bafecd4a03e3015e72bc232896288943c73d62cd98adf3bc1bb15f4a381"
1:    Expected (sha256sum_unix): "ed0d5921b2fc11b6b4cb214b3e43ea3ea28987d6ff8080faab54c4756de30af6"
1:    Loc: [/home/lazy/Desktop/quazip-feature-strategyOptions/qztest/testjlcompress.cpp(392)]
cen1 commented 1 month ago

I think it's because we also need to explicitely define subdirectory permissions to make it fully reproducible. Try to run unit tests locally on feature/strategyOptions and if that matches I think we're good.

If not, you probably have different default directory perms than me locally and/or CI runners.

In that case, try to fiddle with: https://github.com/stachenov/quazip/blob/feature/strategyOptions/qztest/qztest.cpp#L60 and https://github.com/stachenov/quazip/blob/feature/strategyOptions/qztest/qztest.cpp#L76 to get it to match.

lazyone233 commented 1 month ago

I've tested, and all tests have been successfully passed. It seems there are no issues at the moment.

cen1 commented 1 month ago

Closing in favor of #206