Open nigeltao opened 2 years ago
I've always disliked RIFF thanks to avi files (and rarely wav files) hitting filesize limits back in the day (if only they had been a little more forward thinking and used 64 bit even if only in the main header), but stuffing QOI into a RIFF format is still appealing for automatic metadata support at least.
To get around limitations of RIFF the QOIX chunk could have a u32 field specifying how many following tile chunks the data is comprised of, allowing the stream data to exceed 4GiB assuming tools don't break when given a >4GiB file (put all metadata at the beginning if possible). Limiting a tile chunk to 4GiB (possibly 2GiB) is then enough to conform? The QOIX chunk could potentially also store the actual filesize as u64.
It wouldn't take too much thought from there to allow the format to represent any of the following:
Well, RIFF / IFF is just a sequence of chunks and a chunk is (FourCC, Size32, Payload). It'd be easy (if non-standard) to define a "64 bit IFF" where Size was 8 bytes instead of 4. Something like:
That's it.
We could possibly drop RIFF's "payloads are padded to an even number of bytes", while we're there.
The first byte of "IFF6" might also change to something like 0xEE (Latin-1 "î") to avoid being ASCII or UTF-8, and also avoid collding with e.g. TIFF images can start with 0x49 "I".
We could also possibly define "a Size64 of 0xFFFF_FFFF_FFFF_FFFF
" means indeterminate (i.e. read to EOF), for the streaming case. Or maybe not. Just thinking out loud.
If we break the RIFF spec like that does it make sense to use RIFF? Using 8 bytes in a RIFF chunk header will presumably break all tools for metadata etc that seem like the main reason to use RIFF. Even >4GiB may be a stretch, using RIFF might mean biting the 32 bit bullet by checking width*height*5 + overhead
for overflow.
You're probably right though that, barring big animations or pathological (noisy) images, 4GiB isn't going to bite in practice.
Still, even if it wouldn't be RIFF. It'd be a very straightforward RIFF-like type-length-value format.
Even if we stuck with official RIFF (with its 4GiB limitation), RIFF+QOI still wouldn't be an AVI, WAV or WEBP file. Are there common RIFF tools that just work on the generic format (as opposed to being e.g. specifically an AVI player or WAV editor)?
What about just avoiding the nesting of the main chunk of RIFF. This still limits per-chunk data to ~2/4GiB, but causes any implementation to expect reading to EOF. Thus as long as implementations can read larger files arbitrary file sizes can be supported as long as individual parts of the file fit in 4GiB (which is a good requirement to have even with current amounts of memory available).
This also plays well with the tiling support mentioned above as multiple tiles are automatic as they are just multiple chunks of data.
Here is a concrete suggestion for an extensible header / container format. There are reserved-zero bits that are effectively a version number (#2). It can hold metadata such as color profiles (#7). It allows parallelized decoding (#9). Etc.
It builds on RIFF and vanilla (headerless) QOI, heavily inspired by how WebP Extended builds on RIFF and VP8/VP8L. We could call this format "qoiriffic", file extension "qoir".
RIFF
A quick overview: RIFF is a 12 byte header then a sequence of chunks.
The header is:
u32le
File Size (yeah, for streaming encoders, use vanilla QOI with a different header / container) andEach chunk is:
u32le
Chunk FourCC (e.g. "EXIF"),u32le
Chunk Size andQOIX
The first chunk is a "QOIX" chunk, which is pretty much exactly like WebP's "VP8X" chunk, except that the Alpha (L) and Animation (A) bits must be zero. Alpha is already part of vanilla QOI (unlike VP8). Animation doesn't seem necessary but we could add it (copying WebP/VP8X) if we wanted to.
This "QOIX" chunk has reserved-zero bits (the high 24 bits of the
u32le
after "QOIX") that can act as version number. The low 2 bits could indicate the presence of "COMP" and "TILE" chunks.TBD: After that is an optional "COMP" chunk, for configuring custom commpression schemes (e.g. xz, zstd).
TBD: After that is an optional "TILE" chunk, for cutting a larger image into uniform-sized tiles (and maybe specifying a background color for missing tiles).
After that is an optional "ICCP" chunk, again just like WebP/VP8X.
After that are optional "pre-QOIT" chunks.
After that comes one or more "QOIT" chunks: QOI tiles.
After that are optional "post-QOIT" chunks.
After that is an optional "EXIF" chunk, again just like WebP/VP8X.
After that is an optional "XMP " chunk, again just like WebP/VP8X.
QOIT
Each QOI tile chunk is:
u32le
Chunk FourCC ("QOIT"),u32le
Chunk Size andThat Payload starts with a
u32le
Flags field. From LSB to MSB:uint8
pixel valuep
is replaced by(((p&0x3F) << 2) | ((p&0x3F) >> 4))
.After the 4 byte Flags:
u32le
values x0, y0, x1, y1 that define the top-left (inclusive) and bottom-right (exclusive) of this tile. TBD: some of this (e.g. if tiles have uniform width and height) could be factored out into a global "TILE" chunk.u32le
Compression Configuration Size value and then CCS bytes to define the compression codec. For example CCS=4 may be followed by the 4 bytes "zstd" for Zstandard with no further configuration. Qoiriffic decoders must support LZ4 block compression (given in the header bit) but aren't required to support any custom compression codecs (the decoding fails as if it was an unsupported pre-QOIT chunk). TBD: again, some of this could be factored out into a global "COMP" chunk.After that, vanilla QOI (i.e. bytecode) without the 14-byte header (but with the 8-byte padding trailer).
Pre-QOIT and Post-QOIT Chunks
These are extensions - arbitrary chunks that not all decoders are required to support. But if you control all of the producer and consumer implementations (e.g. a video game's first party assets), feel free to put your custom extensions here.
Being before or after the QOIT chunks corresponds to being a critical or ancillary chunk in the PNG spec. Unsupported pre-QOIT chunks (e.g. some sort of Hilbert curve pixel traversal configuration #6 or subtract-green or palette transform) means that the overall decoding fails but unsupported post-QOIT chunks (e.g. some sort of thumbnail or modification-time representation) can be ignored.
This document does not define any extensions.