inikep / lzbench

lzbench is an in-memory benchmark of open-source LZ77/LZSS/LZMA compressors
885 stars 179 forks source link

Adjust zstd parameters to better match zstd's cli levels and parameters #20

Closed chipturner closed 8 years ago

chipturner commented 8 years ago

This will effectively change the higher levels to match the command line tool which limits wlog to 23 (in zstd's fileio.c). It should make performance more similar to what the command line obtains at the highest compression levels.

Open to suggestions on how to do this; the previous zstd22/zstd24 options don't quite seem to be the same, and being closer to the command line tool seems ideal. I believe that zstd now matches the command line at all levels and zstd_ult matches the command line at levels 20-22 when --ultra is specified.

Also tweak some if the initialization functions to fit what (I believe) is the current API approach.

inikep commented 8 years ago

Modes with 22 and 24-bit window: brotli22, brotli24, lzham22, lzham24, zstd22, zstd24 were created to compare given compressors using the same window size. The "zstd" mode is equal to "zstd.exe --ultra" because people using the zstd library will get the same results. IMHO it's independent from CLI.

chipturner commented 8 years ago

How would you feel about just adding a zstd_cli version where the levels map to the same windows used on the command line? As most people will use the command line rather than APIs, I'd like to present numbers similar to that in addition to what we currently have.

Cyan4973 commented 8 years ago

It's a valid point.

Currently, zstd enforces maximum distance on CLI side. So there is discrepancy with the API, which doesn't enforce such rule.

One reason is, there is no --ultra setting on the API side, it's only a CLI stuff. CLI uses advance interface to dynamically reduce distance settings. So levels 20+ are no longer really level 20+.

Maybe we need to think of a way to make both behaviors converge. A cheap possibility could be to change the meaning of --ultra, translating into "maximum allowed level is 19" (which is the last level with max distance 8 MB). ultra would then "unlock" levels 20+. This way, levels would have same meaning, on both CLI and API.

chipturner commented 8 years ago

Another big difference is the larger window significantly slows decompression speed -- almost 2x in some of my testing. I think there are valid use cases with level=22 but not with ultra (and maybe ultra at level 20); it depends on how much memory the client can spare. I wonder if we should have a linear scale (1-22 being what it is now, and 23-25 being ultra plus 20-22)? Is there precedent set in other compression algorithms?

Regardless, it definitely is odd behavior in lzbench; it also is odd how we override chainLog. Is that still necessary? Can't we rely on ZSTD_adjustCParams to do that based on windowLog and compression level?

Cyan4973 commented 8 years ago

it also is odd how we override chainLog. Is that still necessary? Can't we rely on ZSTD_adjustCParams to do that based on windowLog and compression level?

I haven't followed that part, but indeed, modifying chainlog should be unnecessary. Just change windowLog, and let ZSTD_adjustCParams() automatically adjust the rest.