Open M-Gonzalo opened 5 years ago
This is similar to the old zlib behaviour (e.g. #21 ), recompression isn't identical. A more advanced bZip2 recompression algorithm (similar to what preflate does with zlib) would be needed here to completely solve this (which won't happen soon, I guess).
Anyway, there's another remaining issue I'd like to point out: The partial match found here is hurting the compression ratio. When using -v
:
Compressed size: 1390169
Can be decompressed to 6285312 bytes
Identical recompressed bytes: 52 of 1390169
Identical decompressed bytes: 997888 of 6285312
Best match: 52 bytes, decompressed to 997888 bytes
Using -cl
, this leads to 1,629,242 bytes (instead of 1,390,354 bytes using -t+
), so it would be useful to use the partial match mechanism introduced in https://github.com/schnaader/precomp-cpp/commit/cfa602c1ce2e1abb3eef6c5013defff103756ede for bZip2 streams, too.
Discarding insufficient partial matches like described above now. New output of -v
:
(0.00%) Possible bZip2-Stream found at position 0, compression level = 9
Compressed size: 1390169
Can be decompressed to 6285312 bytes
Identical recompressed bytes: 52 of 1390169
Identical decompressed bytes: 997888 of 6285312
Not enough identical recompressed bytes
No matches
New size: 1390354 instead of 1390169
The issue will stay open as a known issue, I changed the title to make it more clear what the issue is about.
The file in https://web.archive.org/web/20150319192112/http://freearc.org/download/testing/FreeArc-0.67-alpha-sources.tar.bz2 decompresses to 5855141 bytes but precomp -cn yields a file of 2388084 bytes.