Closed silversquirl closed 2 years ago
Hi! Could I ask what block sizes did you select? BZip3 offers good performance on reasonably big blocks (16M, 32M), while the maximum BZip2 block size that you can select via CLI is just 900K. For a fair comparison one should try tweaking BZip2 to use a bigger block size. Speaking of benchmarks:
https://github.com/kspalaiologos/bzip3/blob/master/etc/BENCHMARKS.md
BZip3 happens to be usually 14-15s slower than BZip2 on big files (~> 1.2GiB).
Finally, to accomplish ratios comparable to BZip3, the reference BZip2 implementation would slow down a lot, hence I claim that BZip3 faster. BZip3 can sometimes compress as well as half of the competing BZip2 size.
Had a go with a variety of block sizes, can't seem to get it to run faster than bzip2 on the Calgary corpus, though it definitely does produce a better compression ratio. I'm not sure how one would make bzip2 achieve a similar ratio, afaik it's not possible to push it beyond -9? Perhaps I'm wrong
You have to use the C API, not the CLI.
By the way, BZip3 supports parallel compression, while BZip2 doesn't. This could also be argued for better (but not single thread) performance.
Parallel compression definitely sounds like a benefit! Is that implemented in the CLI tool or library in this repo, or is it just a theoretical thing at the moment?
It's implemented in the library, but not yet in the CLI.
./bzip3 -e -b 16 -j 4 corpus/linux.tar corpus/linux.bz3
./bzip3 -d -j 4 corpus/linux.bz3 corpus/linux2.tar
First command takes 29s of wall clock time, the second command takes 20s of wall clock time.
By the way, BZip3 supports parallel compression, while BZip2 doesn't
pbzip2 (http://compression.ca/pbzip2/) and lbzip2 (https://lbzip2.org/) speak an unchanged bzip2 file format but can compress and decompress in parallel.
According to some quick tests of my own, as well as the image in the readme, bzip3 is actually noticeably slower than bzip2. If bzip3 is going to claim to be faster than bzip2, it'd be nice to have some benchmarks to back that up.