phoboslab / qoa

The “Quite OK Audio Format” for fast, lossy audio compression
MIT License
767 stars 42 forks source link

Specification Draft #18

Closed phoboslab closed 1 year ago

phoboslab commented 1 year ago

While there's still some details to discuss (specifically fields in the file & frame headers), I started working on the file format specification. The current draft can be found here:

https://qoaformat.org/qoa-specification-draft-01.pdf

I'm sure I forgot to mention some details and/or need to clarify things. Please let me know!

chocolate42 commented 1 year ago

Is there a valid reason to allow zero channels? I suppose it could be used to encode digital silence. If you don't want that, probably best to have num_channels represent 1 to 256 or at least 1 to 255. Very low bitrates trip up streams IIRC, which I think is why libFLAC implemented a min bitrate option.

Other than that it looks good to me (haven't had a chance to fully understand the algorithm so can't validate that), you've even used flac's default channel order which is nice.

phoboslab commented 1 year ago

The spec forbids 0 channels:

A valid QOA file must have at least one frame, containing at least one channel and one sample with a samplerate between 1..16777215

We could surely define the num_channels field to represent a range of 1 .. 256, but I think it's neat that the value in the file is the value, without any transformations (same for samplerate); and it's not like having just 255 channels instead of 256 is a big limitation :)

chocolate42 commented 1 year ago

A valid QOA file must have at least one frame, containing at least one channel and one sample with a samplerate between 1..16777215

That forbids 0 channels for static files when paired with this

static files (those with samples set to a non-zero value), each frame must have the same number of channels and same samplerate.

but not a streaming context. Explicitly stating num_channels is 1..255 would solve that.

Paulie68000 commented 1 year ago

Just trundling through the spec, (love it) and ever the stickler for understanding how tables are derived, is there a specific reason for the scalefactor generation to use a power value of 2.75 that I've missed?

phoboslab commented 1 year ago

Through careful deliberation (trial & error) I came to the conclusion that the prediction is usually accurate enough for the scalefactor to top out at 2048. It's also advantageous to have more precision on the lower end. pow(s, 2.75) satisfies both and is easy to document :)

Some more details here: https://github.com/phoboslab/qoa/blob/master/qoa.h#L155-L176

Paulie68000 commented 1 year ago

Ahhh cool, I wondered if it was something obvious I'd missed or whether it was an exponent value that satisfied your requirements. Looking (well, sounding!) good. You realise you're going to have to end this series with a QOV too ;o) I had great fun writing simplified video codecs for use in games a few years back - they were quite OK! Keep up the great work.

kleinesfilmroellchen commented 1 year ago

Regarding the channel layout, I would like the standard non-film layout L, R, C, LFE, BL, BR, SL, SR (which the spec uses) to be "mandatory in general-purpose files" for channel counts 1 to 8. For how these are usually laid out; 1, 2, 3, and 8 are intuitive and https://datatracker.ietf.org/doc/html/draft-ietf-cellar-flac-07#name-channels-bits is in my opinion the best use of space in the other cases:

This adds the most important channels first when increasing the channel count, makes centre and LFE have a constant position, and also ensures that for channel counts 4 and up, the last two channels are always some kind of surround channel.

This will prevent incompatible divergent implementations: Downmixing extra channels to stereo is the most important consideration.

Of course, if a file is application-specific (e.g. in games), it can deviate from this standard layout, but any general-use file should have to follow this layout.

kleinesfilmroellchen commented 1 year ago

For the detailed decoder explanation, some editorial suggestions:

kleinesfilmroellchen commented 1 year ago

Finally, regarding frame sizes:

phoboslab commented 1 year ago

Great suggestions, thanks!

Here's an updated draft: https://qoaformat.org/qoa-specification-draft-02.pdf

Changes:

Question: what's the correct wording here?

kleinesfilmroellchen commented 1 year ago

Thanks for the corrections and the specification of standard channel layouts! There's a small typo "expcet" after the slice illustration.

Question: what's the correct wording here?

  • Channels are interleaved per slice.
  • Slices are interleaved per channel.

I think the latter is more accurate; the FLAC spec uses the term "channel-interleaved" regularly.

I'm still missing the short mention of "usually 5120 samples per frame", did you forget or is there a particular reason?

Other than that, I'm very happy with the state of the spec!

phoboslab commented 1 year ago

I'm still missing the short mention of "usually 5120 samples per frame", did you forget or is there a particular reason?

I don't see the point. Specifying that a frame has 256 slices per channel and one slice has 20 samples should be sufficient!?

Unrelated: Over in the Hydrogen Audio Forums there's a point being made to allow setting the channel allocation separately from the number channels. I have to say that it seems like overkill for this otherwise very simplistic format. Are anything but 1, 2, 6 or 8 channels in common use these days? Also, FLAC enjoys widespread use despite not being able to allocate channels freely...

chocolate42 commented 1 year ago

I believe flac can allocate channels freely, not in the format itself but in a well-supported extension that's added as a vorbis comment WAVEFORMATEXTENSIBLE, a scheme inherited from modern wave formats. A supporting player can read the comment to know the correct order of the channels.

I don't think it's necessary for qoa. It makes sense if the goal was maximum compatibility in a generic user-facing context, but the point of qoa is a simple way to store lossy audio that a programmer can shape to their whim if they have some specific needs like looping etc. Multiple channels and samplerate is the minimum and only absolute requirement to be a container for a chunk of something considered ready-to-go audio. Anything else can be bolted on if necessary IMO.

phoboslab commented 1 year ago

I agree. The channel allocation issue - while interesting - is drifting into bike shedding territory. QOA is not (and does not want to be) a general purpose audio format in the same reigns as MP3 or FLAC. I believe the current solution is certainly good enough for this format.

If nothing else pops up, I will the declare the current spec draft 0.3 as final early next week.

Thanks everyone!

kleinesfilmroellchen commented 1 year ago

Repeating a typo note: "excpet" in the paragraph below the slice diagram. (That's also my last comments; everything else seems good)

phoboslab commented 1 year ago

I have declared the spec as final. It can be found on https://qoaformat.org

Closing this issue as completed!