nigeltao / qoi2-bikeshed

"Quite OK Image" version 2 discussions
32 stars 0 forks source link

RIFF container #27

Open nigeltao opened 2 years ago

nigeltao commented 2 years ago

Here is a concrete suggestion for an extensible header / container format. There are reserved-zero bits that are effectively a version number (#2). It can hold metadata such as color profiles (#7). It allows parallelized decoding (#9). Etc.

It builds on RIFF and vanilla (headerless) QOI, heavily inspired by how WebP Extended builds on RIFF and VP8/VP8L. We could call this format "qoiriffic", file extension "qoir".

RIFF

A quick overview: RIFF is a 12 byte header then a sequence of chunks.

The header is:

Each chunk is:

QOIX

The first chunk is a "QOIX" chunk, which is pretty much exactly like WebP's "VP8X" chunk, except that the Alpha (L) and Animation (A) bits must be zero. Alpha is already part of vanilla QOI (unlike VP8). Animation doesn't seem necessary but we could add it (copying WebP/VP8X) if we wanted to.

This "QOIX" chunk has reserved-zero bits (the high 24 bits of the u32le after "QOIX") that can act as version number. The low 2 bits could indicate the presence of "COMP" and "TILE" chunks.

TBD: After that is an optional "COMP" chunk, for configuring custom commpression schemes (e.g. xz, zstd).

TBD: After that is an optional "TILE" chunk, for cutting a larger image into uniform-sized tiles (and maybe specifying a background color for missing tiles).

After that is an optional "ICCP" chunk, again just like WebP/VP8X.

After that are optional "pre-QOIT" chunks.

After that comes one or more "QOIT" chunks: QOI tiles.

After that are optional "post-QOIT" chunks.

After that is an optional "EXIF" chunk, again just like WebP/VP8X.

After that is an optional "XMP " chunk, again just like WebP/VP8X.

QOIT

Each QOI tile chunk is:

That Payload starts with a u32le Flags field. From LSB to MSB:

After the 4 byte Flags:

After that, vanilla QOI (i.e. bytecode) without the 14-byte header (but with the 8-byte padding trailer).

Pre-QOIT and Post-QOIT Chunks

These are extensions - arbitrary chunks that not all decoders are required to support. But if you control all of the producer and consumer implementations (e.g. a video game's first party assets), feel free to put your custom extensions here.

Being before or after the QOIT chunks corresponds to being a critical or ancillary chunk in the PNG spec. Unsupported pre-QOIT chunks (e.g. some sort of Hilbert curve pixel traversal configuration #6 or subtract-green or palette transform) means that the overall decoding fails but unsupported post-QOIT chunks (e.g. some sort of thumbnail or modification-time representation) can be ignored.

This document does not define any extensions.

chocolate42 commented 2 years ago

I've always disliked RIFF thanks to avi files (and rarely wav files) hitting filesize limits back in the day (if only they had been a little more forward thinking and used 64 bit even if only in the main header), but stuffing QOI into a RIFF format is still appealing for automatic metadata support at least.

To get around limitations of RIFF the QOIX chunk could have a u32 field specifying how many following tile chunks the data is comprised of, allowing the stream data to exceed 4GiB assuming tools don't break when given a >4GiB file (put all metadata at the beginning if possible). Limiting a tile chunk to 4GiB (possibly 2GiB) is then enough to conform? The QOIX chunk could potentially also store the actual filesize as u64.

It wouldn't take too much thought from there to allow the format to represent any of the following:

nigeltao commented 2 years ago

Well, RIFF / IFF is just a sequence of chunks and a chunk is (FourCC, Size32, Payload). It'd be easy (if non-standard) to define a "64 bit IFF" where Size was 8 bytes instead of 4. Something like:

That's it.

We could possibly drop RIFF's "payloads are padded to an even number of bytes", while we're there.

The first byte of "IFF6" might also change to something like 0xEE (Latin-1 "î") to avoid being ASCII or UTF-8, and also avoid collding with e.g. TIFF images can start with 0x49 "I".

We could also possibly define "a Size64 of 0xFFFF_FFFF_FFFF_FFFF" means indeterminate (i.e. read to EOF), for the streaming case. Or maybe not. Just thinking out loud.

chocolate42 commented 2 years ago

If we break the RIFF spec like that does it make sense to use RIFF? Using 8 bytes in a RIFF chunk header will presumably break all tools for metadata etc that seem like the main reason to use RIFF. Even >4GiB may be a stretch, using RIFF might mean biting the 32 bit bullet by checking width*height*5 + overhead for overflow.

nigeltao commented 2 years ago

You're probably right though that, barring big animations or pathological (noisy) images, 4GiB isn't going to bite in practice.

Still, even if it wouldn't be RIFF. It'd be a very straightforward RIFF-like type-length-value format.

Even if we stuck with official RIFF (with its 4GiB limitation), RIFF+QOI still wouldn't be an AVI, WAV or WEBP file. Are there common RIFF tools that just work on the generic format (as opposed to being e.g. specifically an AVI player or WAV editor)?

BenBE commented 2 years ago

What about just avoiding the nesting of the main chunk of RIFF. This still limits per-chunk data to ~2/4GiB, but causes any implementation to expect reading to EOF. Thus as long as implementations can read larger files arbitrary file sizes can be supported as long as individual parts of the file fit in 4GiB (which is a good requirement to have even with current amounts of memory available).

This also plays well with the tiling support mentioned above as multiple tiles are automatic as they are just multiple chunks of data.