lichess-org / database

Public exports of all rated games, puzzles, and computer evaluations.
https://database.lichess.org
GNU Affero General Public License v3.0
83 stars 26 forks source link

switch to pbzip2 #19

Closed niklasf closed 4 years ago

niklasf commented 4 years ago

pbzip2 - parallel bzip2 file compressor, v1.1.6 https://linux.die.net/man/1/pbzip2

not so much to improve compression speed, but to speed up decompression:

Files that are compressed with pbzip2 are broken up into pieces and each individual piece is compressed. This is how pbzip2 runs faster on multiple CPUs since the pieces can be compressed simultaneously. The final .bz2 file may be slightly larger than if it was compressed with the regular bzip2 program due to this file splitting (usually less than 0.2% larger). Files that are compressed with pbzip2 will also gain considerable speedup when decompressed using pbzip2.

Files that were compressed using bzip2 will not see speedup since bzip2 packages the data into a single chunk that cannot be split between processors.

lakinwecker commented 4 years ago

Also important:

The output of this version is fully compatible with bzip2 v1.0.2 or newer (ie: anything compressed with pbzip2 can be decompressed with bzip2).

Which means it should be safe to use it, at a small size cost.

lakinwecker commented 4 years ago

https://gist.github.com/lakinwecker/3066214a00eb8136f8d1e56fb839c4ba Comparison of the speedups

niklasf commented 4 years ago

Recompression completed.