image-rs / jpeg-decoder

JPEG decoder written in Rust
Apache License 2.0
150 stars 87 forks source link

FF 00 found where marker was expected #228

Open SludgePhD opened 2 years ago

SludgePhD commented 2 years ago

This image fails to decode with "FF 00 found where marker was expected", but succeeds in libjpeg-turbo (tested via sxiv), Dolphin, Firefox and VS Code (not sure what library those use):

broken

quilan1 commented 2 years ago

This is a malformed jpeg file -- some analysis follows. In the file, starting from the hex offset 0x390 (the 912th byte), you'll find the following bytes, near the top'ish of the entropy stream:

[23 3A FF FF 00 22 3F ...]

Those FF bytes there are the problem. In the spec:

B.1.1.5, Entropy-coded data segments [...] In order to ensure that a marker does not occur within an entropy-coded segment, any X’FF’ byte generated by either a Huffman or arithmetic encoder, or an X’FF’ byte that was generated by the padding of 1-bits described in NOTE 1 above, is followed by a “stuffed” zero byte (see D.1.6 and F.1.2.3).

Plainly stated, an FF FF in the stream is invalid. So, the question becomes -- what should we do when we encounter this pattern? I imagine three choices:

  1. Throw an error, because the stream is corrupt (jpeg-decoder's chosen method)
  2. Ignore the second FF, and continue with the stream
  3. Realize that we're missing a 00 after the first FF, silently pretend like we saw one and continue

I believe jpeg-decoder is doing the right thing here, by throwing an error on the problem, but for the purpose of completeness, let's consider a more lenient approach. In this scenario, only choice 2 actually correctly decodes the stream -- this can be shown by removing the second FF byte, and seeing that the file loads without error. The problem is, from the standpoint of the library, it's completely ambiguous as whether or not deleting the next byte, or virtually inserting an invisible 00 in there is the correct decision to make. Sure, they'll both decode now, but it's ambiguous as to which was the intent of the encoder -- in this case method 3 yields visual artifacts in the decoding.