pfalcon / uzlib

Radically unbloated DEFLATE/zlib/gzip compression/decompression library. Can decompress any gzip/zlib data, and offers simplified compressor which produces gzip-compatible output, while requiring much less resources (and providing less compression ratio of course).
Other
302 stars 82 forks source link

Dealing with illegal symbols in literal/length and distance huffman encodings #16

Open pfalcon opened 6 years ago

pfalcon commented 6 years ago

It turns out that to simplify/make possible dealing with Huffman trees, zlib and uzlib have 288 total Huffman codes for literal/length, and 32 for distance. In both cases, last 2 codes are "illegal" and should not appear in the valid DEFLATE stream.

This ticket is open to consider the best way to deal with them in uzlib, where best way is defined as "requiring the least code overhead, while still returning an error for stream containing, and then the earlier the error reported th better".

pfalcon commented 6 years ago

There're basically 2 strategies:

  1. Make sure these codes aren't present in the Huffman tree, and thus decoding them leads to an error. There's a problem then: a) ensure that Huffman decoding ends at all (with an error) when decoding such codes; b) bother to check result of Huffman decoding.

  2. Don't specially handle these codes in the Huffman tree, let them be decoded as normal, and then do bound checking on decoded symbol values. This assumes that otherwise Huffman decoding can't fail, and we can decode any code to symbol, so can trade check of decoding status for bounds check on decoded symbol. But it's unclear if this assumption is correct.

pfalcon commented 5 years ago

286 vs 288, 30 vs 32 dichotomy is also mentioned in my old StackOverflow question, https://stackoverflow.com/questions/25431160/what-is-the-maximum-size-of-encoded-dynamic-huffman-tree-as-used-by-deflate-zli/25462620, which has useful further references on the matter.

pfalcon commented 5 years ago

(This comment was originally posted on the wrong ticket, on 2018-07-09.)

In https://github.com/pfalcon/uzlib/pull/10#issuecomment-403219312, I decide to go for p.2 for now, simply because it's more obvious how to patch in boundary check, and that satisfies AFL crash corpus - than to figure out how to patch Huffman tree correctly (well, to prove that intuitive patchings are formally correct). But again, it's unclear if that's good/enough.