ulikunitz / xz

Pure golang package for reading and writing xz-compressed files
Other
477 stars 45 forks source link

Equivalent of FastBytes? #53

Closed zfLQ2qx2 closed 1 year ago

zfLQ2qx2 commented 1 year ago

The 7zip SDK has something they call "FastBytes" which seems to be a mechanism to limit how much time is spent looking for the best sequence to add to the dictionary. I don't see an equivalent here, how do you get around that?

ulikunitz commented 1 year ago

In the current release i have a constant limit, maxMatches=16, that limits the amount of the number of positions searched for. (I add also a number of short distances, which from my recent experience was misguided.) You have to look into lzma/hashtable.go file to investigate the implementation. Note I wrote the code without a lot of experience in the field.

I have done a lot of experiments on that recently. What I found is that a hash to a linear list as in the current implementation, doesn't provide a lot better compression than 2 hashes with different input lengths, but it is much slower. Right now I have very fast implementations, where the whole search mechanism is done in a single loop without function calls, but this code can't reach the compression rates of the original xz. I'm currently working on a tree implementation that can compete with the bt4 match finder of the original xz implementation. I have also added parallel compression and decompression modes. But I want to achieve the compression rates of xz before the next release.

ulikunitz commented 1 year ago

I assume that the answer has been comprehensive.