square / gifencoder

A pure Java library implementing the GIF89a specification. Suitable for use on Android.
Apache License 2.0
666 stars 75 forks source link

Pad the LZW code table if less than minimum size #14

Closed rendaw closed 5 years ago

rendaw commented 5 years ago

I think this fixes #11.

My understanding is that while the color table has a minumum size of 2, the LZW minimum code size of 2 indicates that the code table has 4 colors, so the code table needs to be padded with two empty colors before the cc and eoi codes. I believe decoders were failing because they expected cc and eoi codes to have different code indexes and were thus misinterpreting the whole stream (first code in the stream is always cc).

I think this is indicated on page 32 here https://www.w3.org/Graphics/GIF/spec-gif89a.txt although I'm bad with specs without lots of pictures and pretty diagrams... it says

A special Clear code is defined which resets all compression/decompression parameters and tables to a start-up state. The value of this code is 2**\<code size>.

so for a code size of 2, the cc index would be 4 (0-1 would be colors, 2-3 ??).

gifencoder currently has 2 as the current minimum for the LZW minimum code size value, and I believe this is correct (i.e. the issue isn't that the minimum code size is being set too high). For reference, the definition says a bit above that that for 2 color images the minimum code size is 2:

Because of some algorithmic constraints however, black & white images which have one color bit must be indicated as having a code size of 2. It's not indicated how "just black" or "just white" images are handled but I guess it's reasonable to assume that anything less than 5 colors has a code size of 2...

I couldn't find information on padding. My understanding is that since the code table is used by matching color table indexes, reusing color index 0 for the padding elements is fine since a lookup for color 0 will never get that far.

In any case I tried this and I could read the output file with a couple programs (not entirely sure they aren't all using the same library).

rendaw commented 5 years ago

Build failure unrelated?

dlubarov commented 5 years ago

Thank you for looking into this! I need to reacquaint myself with the spec and the code a bit, but will try to reply soon, maybe later today.

dlubarov commented 5 years ago

Your explanation makes perfect sense; I must have missed that part of the spec.

I'm not sure if this would matter in practice, but it seems like to be 100% conformant, we should pad up to 2**code_size (code_size being the "minimum" size, before we add 1 for the special codes), rather than stopping at 4?

rendaw commented 5 years ago

Right now numColors is the padded size from the color table, so it's guaranteed to be a power of two but its minimum value is 2 rather than 4.

The 4 is kind of a magical number, maybe adding a doc comment to the LzwEncoder constructor specifying that numColors must be a power of two? Or I could make it a constant with a comment on its derivation?

rendaw commented 5 years ago

I have an alternate fix in #15

dlubarov commented 5 years ago

Oh right -- forgot that numColors was already padded to a power of 2.

I think I prefer your other fix for now, and afterward I can refactor a bit to explicitly derive numColors from the code size (to make sure it's clear where the minimum comes from), and maybe name it something different as a hint that numColors isn't necessarily the same as the color table size.