image-rs / deflate-rs

An implementation of a DEFLATE encoder in rust
Apache License 2.0
53 stars 14 forks source link

Performance improvements #4

Open oyvindln opened 7 years ago

oyvindln commented 7 years ago

The performance is still not great compared to miniz/zlib on files with long runs of the same byte.

EDIT: See next post.

Profiling reveals that lz77::longest_match and lz77::get_match_length is where most time is spent.

get_match_length is particularly problematic for data where there is a lot of repetitions of one literal that causes a lot of calls to this function. (As there will be a large amount of entries in the hash chain for the 3-byte sequences of this byte.) Currently it uses two zipped iterators to compare the matches, which may not be ideal performance wise. C implementations of deflate seem to be checking multiple bytes at once by casting the bytes to larger data types. I've tested this, but it didn't seem to make a difference.

In the longest_match function, array lookups seems to be the main cause of the slowdown (maybe because further instructions depend on the array value?). If we can find a way to reduce the number of lookups, or length of the hash chains without impacting compression ratio, that would be helpful to improve performance.

For lower compression levels, other compressors simply hard-limit the length hash chains, and further adaptively reduces the hash chain length when there is a decent match.

oyvindln commented 7 years ago

So the main point where the current implementation (outside of RLE mode) spends much time (according to operf) is the comparison after loading the next value from the hash chain in the matching function: https://github.com/oyvindln/deflate-rs/blob/dd923fb9950b6fce37cec9ed6ab617a036c559b8/src/matching.rs#L103

Maybe this is the cpu stalling waiting for data or something.

Current plans for improvements that may or may not help: