Not clear how to trade most resources for most compression

ulikunitz / xz

Pure golang package for reading and writing xz-compressed files

Other

477 stars 45 forks source link

Not clear how to trade most resources for most compression #26

Open ribasushi opened 4 years ago

ribasushi commented 4 years ago

The documentation of WriterConfig is somewhat sparse. I would like to emulate ( in spirit, I understand the algorithm is not perfect ) the result of xz --lzma2=preset=9,dict=128MiB

Could you please point me to a "starting point" ?

Thanks!

ulikunitz commented 4 years ago

The DictCap field defines the dictionary size. So DictCap = 128 1024 1024 would define a larger dictionary. Note that the actual memory consumption is a multiple of the dictionary size, because dictionary need to be hashed. The default size is already 8 MByte so only for files with a larger size there will be any effect at all.

The LZMA properties (lc, lp, pb) are the same as described in the xz manual.

The buffer size might increase compression speed a little bit. It has almost no effect on compression ratio.

mfischr commented 4 years ago

Surprisingly, with xz, the values are always lc=3,lp=0,pb=2 no matter what preset you choose. According to the manual, the preset affects other settings like dictionary size, match finder, 'nice', and 'depth'.

Btw, the values of LC, LP, PB are stored in a single byte encoded according to this formula, and in one .xz file I tried, that byte appeared at position 0x1d.

That said, I can't get this package to replicate the same results I'm getting with preset=2 (it's about 20% larger)

ulikunitz commented 4 years ago

The package doesn't implement the same algorithm as xz. So the results and compression rate will be different.