zopfli-rs / zopfli

A Rust implementation of the Zopfli compression algorithm.
Apache License 2.0
37 stars 4 forks source link

Stop early if randomize_stat_freqs won't do anything #22

Closed Pr0methean closed 1 year ago

Pr0methean commented 1 year ago

randomize_stat_freqs can't have any effect in the case where all the symbols in beststats have the same frequency, nor does lz77_optimal have any way to improve the compression if that's the case and the last iteration had no effect (and I doubt any improvement is possible since this means they follow a maximum-entropy distribution). So this PR ends the deflation of the block early if that happens.

As well, since the above check means it can no longer get stuck in an infinite loop, this PR changes randomize_stat_freqs to run again if it hasn't made an actual change, which eliminates redundant LZ77 runs. Dead-code elimination could not have done this, because calling a PRNG mutates its state.

This PR also postpones running add_weighed_stat_freqs until we know its result won't be overwritten from best_stats before it's used.

Pr0methean commented 1 year ago

Have you confirmed with Trace-level logging that the repeated LZ77 runs are making the difference, and not just the slightly different PRNG state from having been invoked fewer times? (AFAIK the latter is why compiler optimizations can't already provide early stopping.) Because I'm under the impression that LZ77 is idempotent.

AlexTMjugador commented 1 year ago

I didn't check whether the root cause for the regressions was the different PRNG state or the fact that less LZ77 runs are done, but our results show that even if the cause is a different PRNG state, on average it's slightly worse, and thus the behavior change is a good candidate to be toggleable.

Pr0methean commented 1 year ago

It seems like the optimality is very sensitive to the PRNG state. (Compare the last two commits on eeyore.png!) I think instead of trying to micro-optimize Zopfli, I'll fork it and experiment with other methods of optimizing stats such as genetic algorithms.