w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
971 stars 136 forks source link

Should AudioSampleFormat enums values be lowercase? #300

Closed tguilbert-google closed 3 years ago

tguilbert-google commented 3 years ago

Follow up to #256, brought up by @sandersdan.

It's standard for video pixel formats to be uppercase, but audio sample formats don't have such a strong convention. JS enums are more commonly seen as lower case. @padenot, was there a specific intention with going to uppercase?

Any thoughts on using u8p or u8P instead of U8P?

chcunningham commented 3 years ago

I think this may have started with my suggestion here https://github.com/w3c/webcodecs/issues/179#issuecomment-828856379

I was inspired by https://source.chromium.org/chromium/chromium/src/+/master:third_party/ffmpeg/libavutil/samplefmt.h;drc=221e331b49dfefadbc6fa40b0c68e6f97606d0b3;l=58

I'm not dug in though if @padenot prefers a different casing. Lets wrap this one up quickly though. This is definitely repainting the bikeshed territory... wouldn't be wise after shipping.

tguilbert-google commented 3 years ago

Lets wrap this one up quickly though.

Agreed. Any outcome is fine with me.

sandersdan commented 3 years ago

Relatedly, any particular reason that S24 is defined to be aligned at the LSB? For video it's more common to align to MSB and then most 10-vs-12-bit code is the same. I can't find any "standard" definitions of S24, and FFmpeg doesn't even have it as an AVSampleFormat (ie. it can be in container metadata but internally FFmpeg uses S32).

Edit: PCM S24 is three bytes per sample. I'm not sure that S24-in-int32 is an actual format; should it actually just be defined to be three-byte samples (or removed entirely)?

Edit: I also don't see any mention of endianess. This can be tricky because ArrayBuffers are machine-endian, and we have not had to specify it for video yet because the currently defined video sample formats are byte-oriented. In general PCM comes in BE and LE variants, but I'm assuming we're expecting all audio buffers to be machine-endian in WebCodecs?

padenot commented 3 years ago

I think this may have started with my suggestion here

Indeed.

padenot commented 3 years ago

Edit: PCM S24 is three bytes per sample. I'm not sure that S24-in-int32 is an actual format; should it actually just be defined to be three-byte samples (or removed entirely)?

What I wanted to say is that if you decode a WAV file that is mono 24-bits and contains, say, 1000 samples of a full-scale sine wave, and you read it out to AudioData you get 4000 bytes (4 bytes * 1000 samples) out, as an Int32Array where the minimum and maximum values are -222 and 222. Anything above this clips.

If you then decode a WAV file that's mono 32-bits and also contains 1000 samples of a sine wave that's full-scale, you read it out to an Int32Array buffer, you also get 4000 bytes, but the values are between -230 and 230. But it's still full-scale.

This is going to be useful when copyTo will convert and to check things in this PR. Do you see things differently ?

Native endianess is used on the Web, and in practice that means little-endian because nobody ever tests on anything else than little-endian. Web Codecs shouldn't be different, so yes, machine-endian.

sandersdan commented 3 years ago

Do you see things differently ?

Yes. I really don't want to add anything non-standard to WebCodecs unless there is proven demand, and this version of 24-bit samples doesn't seem to be any standard.

The next obvious choice is to convert S24 data to S32 (by left shifting 8 bits). This is what FFmpeg actually does to decode 24-bit PCM, so I don't think I'm inventing anything new.

If there is no other purpose for the current S24 definition, I would prefer to remove it (and S24P as well).

Machine-endian SGTM.

Returning to the title issue of casing, WebIDL strongly recommends lowercase, hyphen-separated enumeration values. I think the FFmpeg header is suitable as a source of common names but not necessarily for the style of those names. Similar enums exist in WebGPU and there they follow this recommendation (eg. https://www.w3.org/TR/webgpu/#enumdef-gputextureformat).

I'd even consider the same for video pixel formats, the same reasoning applies. In that case though uppercase is much more established due to fourccs, even if WebCodecs isn't actually restricted to fourccs.

tguilbert-google commented 3 years ago

A proposal:

enum AudioSampleFormat {
  "u8",
  "s16",
  "s32",
  "f32",
  "u8-planar",
  "s16-planar",
  "s32-planar",
  "f32-planar",
};

With the recommendation that developers shift their data by 8 to convert between "s24" and "s32".

A piece by piece breakdown of preferences:

The one benefit I would see for a "s24" format would be to eventually use native SIMD instructions to perform that shift on behalf of developers. The format could be added later, when there is a better chance that those optimizations are actually implemented.

padenot commented 3 years ago

This sacrifices any head-room on decode and is lossy in the common case where the compressed audio data is clipping, but OK if that's more standard (I also did not invent what I wrote, fwiw, it's present in a few audio software and libraries).

I guess people that care will use floats anyway and notice the signal is outside [-1,1]. That shouldn't be clipped by WebCodecs and output as-is though, maybe it's worth a note. The rendering system might clip at the output. This is consistent with what Web Audio expects.

I agree to the renaming proposal by @tguilbert-google: