Decompression could be optimized (would be 5x to 20x faster)

rflauer commented 8 years ago

Hi,

LZW decompression could be optimized to be somewhat faster by not using the classic build-a-stack-then-reverse-it technique but rather using references in the read buffer.

I have implemented this in a GIF reader a while ago. If that's of interest I could try working on a patch.

Cheers,

Robert.

vapier commented 8 years ago

feel free to send a PR

rflauer commented 8 years ago

I did some preliminary investigation; this is a bit trickier for the general case than for the GIF case (because a code may refer to a position that is no longer in the current buffer whereas for a GIF image you can keep the whole image in memory) but I think it can be done.

andrew-aladev commented 5 years ago

I've investigated this question. I am not able to read this idea in ncompress, it is not readable, but we have LZW AB instead:

prev_code = prefixes[code]
last_symbol = terminators[code]

We are able to find previous code for current code, we are able to find last symbol for current code. We can see that this solution targets the least possible amount of memory. It can be named as "reverted trie on unidirectional linked list". And it requires you to write data in opposite direction, of course.

It requires only ((2 ** n_bits) - 257) * (sizeof(code_t) + 2). 261 KB for 16 bit codes.

If we want to write output in same direction we have to implement "trie on bidirectional linked list" ot "trie on bidirectional sparse array" or "bidirectional hash map". It is possible but I don't think that it will speed up decompression.

I think that penalty of maintaining any bidirectional structure will be so much more than some buffer remapping.

N-R-K commented 6 months ago

I don't think it's doable because the whole decoded buffer isn't kept in memory.

https://github.com/vapier/ncompress/issues/27 has a decoder that basically stores each string in a flat array in their full form. This is doable. But my problem with this is that the worst case memory usage is unknown. It has to be bounded since the code is bounded to 16bits but I'm not sure what that bound is and it may turn out to be quite high.

Here's mixed approach that uses bounded memory but is faster than the current dict where strings are represented as <prefix_index, last_char> pair: Use <prefix_index, last_chars[N]> instead. Storing multiple chars per "node" in the linked-list results in less pointer hops and larger "payload" per hop.

With N set to 32, the decompression dict memory usage will be roughly >2MiB while being about 10x faster on large chains. Making this change on ncompress is a bit difficult though, the code is quite... ahem, let's say say "ancient" and difficult to modify.

N-R-K commented 6 months ago

But my problem with this is that the worst case memory usage is unknown.

I did a bit of math-ing today and TLDR is that the worst case is roughly 2 gigs.

Longer explanation: worst case would always make sure the new code is larger than the previous. And the only way to do that is to repeat the last code as the prefix + a new byte. In other words, a large run of the same byte will do the trick.

So what we have here is basically an arithmetic progression where each term increases by 1. Using the sum formula we get: (2^16 * (1 + 2^16)) / 2 == ~2GiB. (Note that this is slightly larger than the actual worst case due to not accounting the first 255 pre-filled codes).

I also tested out the theory and the result came out as expected, just short of 2GiB:

$ </dev/zero head -c 4000000000 | ./compress -C | ./compress -d >/dev/null
Dict str usage: 1.984 GiB

So yeah, the strategy presented in https://github.com/vapier/ncompress/issues/27 is not really usable unless you want the decompression worst case memory usage to go up to 2Gigs.

vapier / ncompress

Decompression could be optimized (would be 5x to 20x faster) #9