phoboslab / qoi

The “Quite OK Image Format” for fast, lossless image compression
MIT License
6.85k stars 327 forks source link

Ambiguous specifications for encoding #230

Closed vFetet closed 1 year ago

vFetet commented 2 years ago

I have been tinkering with writing my own QOI encoder/decoder and I found the specifications to be ambiguous on some points when writing my encoder. For image decoding, I feel like the specs are fine as they are.

The output of image encoding is not unique. As long as it can be decoded by a conforming QOI decoder, then it is a valid output. However, there should be some guidelines in the spec on the priority should be given to the different OPs with which to encode data. This would allow for unambiguous testing of encoders by comparing their output to the reference image directly.

As it stands, an encoder could encode images using only RGBA OPs and be fully conformant, yet it's output image binaries can't be compared to the reference images. Worse, the reference encoder is sub-optimal for some pixel sequences, making testing an optimized encoder harder than it should be.

I think the following priority order works best:

  1. QOI_OP_RUN
  2. QOI_OP_INDEX
  3. QOI_OP_DIFF
  4. QOI_OP_LUMA
  5. QOI_OP_RGB
  6. QOI_OP_RGBA

The reasoning is to prioritize small file size, and in case of ties (eg. runs of one pixels can be encoded as a 1-long QOI_OP_RUN, a QOI_OP_INDEX lookup to the previous pixel or a QOI_OP_DIFF with dr = dg = db = 0 in a single byte) to favor the computationally cheaper solution.

However, the current reference implementation would not follow this OP priority in some cases, due to the way the check for a QOI_OP_DIFF and QOI_OP_LUMA are done. Consider the following pixel sequence: #ffffff, #f000000. The second pixel is encoded by the reference encoder as a QOI_OP_RGB chunk (4 bytes: 0xfe 0x00 0x00 0x00) although the specifications allows to encode it as a QOI_OP_DIFF chunk with wraparound (1 byte: 0x7f) (assuming the pixel isn't in the previously-seen pixel array).

So enforcing or encouraging this priority of operations would require to change the reference encoder in order to account for wraparound in QOI_OP_DIFF and QOI_OP_LUMA in the encoding loop.

phoboslab commented 2 years ago

an encoder could encode images using only RGBA OPs and be fully conformant

And the problem with that is? :)

Imagine you need QOI conformance but only care about raw throughput - using QOI_OP_RGBA then would be a sensible choice. Why should we forbid it?

the reference encoder is sub-optimal for some pixel sequences

And that's another reason the spec doesn't dictate any of this.

Imagine Fraunhofer specified the MP3 encoder, instead of the bitstream format and we would be stuck with sub-par MP3 encoders to this day. Therefore no file format (to my knowledge) specifies how the encoder should work, but rather specifies the data format or decoder.

vFetet commented 2 years ago

I get your point, what constitutes an optimally-encoded image is up to interpretation, and is bound to vary with use cases and time. I don't see an issue with an image encoded solely as RGBA chunks being conformant. I also understand that the reference implementation is not the standard.

What I'm seeing as an issue is that you provide test images that will be used by developers to test their implementation of the QOI format, while not providing the logic you applied to produce these images and the logic in question differs from my own understanding of the standard. While both are valid, this makes the results harder to reproduce and implementations harder to test.

The various standards I have encountered usually include both what constitutes a conformant implementation and guidance towards what is considered best practice, using keywords such as "must", "must not", "should", "should not", "required", "recommended", "optional", etc. In fact, the use of these words is standardized in RFC 2119.

The suggestion I'm making here would fall in the "recommended" or perhaps "optional" category, clearing up some ambiguity in the design philosophy and still requiring interoperability with implementations that would not follow these principles. Encoding an image solely as RGBA chunks is valid and decoders must be able to handle this case, but since it bloats file size by 20 to 40% while being less efficient than keeping the data as raw RGB[A], it seems reasonable to mention that you should not do that (unless there "exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful", as per RFC-2119).