madler / zlib

A massively spiffy yet delicately unobtrusive compression library.
http://zlib.net/
Other
5.6k stars 2.43k forks source link

A trick to compress faster #360

Open photopea opened 6 years ago

photopea commented 6 years ago

Hi. There is a trick, which requires additional 64kB of RAM, but can make the compression (search for repeating strings in LZ77) much faster.

You hash 3-bytes into chains, which help you find previous occurences of these 3 bytes. But you hash 3 bytes into 8 bits, and many collisions occur (you process very long chains while looking for the longest string match). I suggest to use 16 bits for hashing. There are much less collisions, shorter chains, so you find occurences faster. I suggest a following hash (since you don't use full Rabin-Karp, I guess hashes do not have to be "cyclic"):

hash = ((byte1 |  (byte2<<8)) + (byte3<<4)) & 0xffff;

I made my own implementation of Deflate (not in C) and 16 bit hashes work much better, than 8 bit hashes. I think it could help ZLIB, too, but I am not that comfortable with C to implement it myself :(

sunny-shu commented 4 years ago

Hi,If I change the hash function like you, what effect does it have on the compression ratio?

photopea commented 4 years ago

@361442342 This has absolutely no effect on the compression ratio. It only makes the compression much faster (which can be useful, when compressing huge files with Level 9).

minchai23 commented 3 years ago

The default hash bits of zlib is 15 , but can be increased to 16 bits by setting memlevel to 9.

deflateInit(strm,level); // set memlevel = 8 

deflateInit2(strm, level, method, windowBits, 9, strategy); // set memlevel = 9