w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
951 stars 135 forks source link

AudioFrame needs another name #168

Closed padenot closed 3 years ago

padenot commented 3 years ago

An audio frame (or sample-frame, or sometimes just frame when talking about audio) is a term that is already in use with a specific meaning in general, on the web platform and even already in web codecs in other locations. It needs renaming.

It means "the ensemble of single audio samples that happen at a specific time, across multiple audio channels", which is unfortunately the opposite of its meaning in the Web Codecs spec, which is a group of continuous audio frames that has a duration.

An explanation here, but this is a pretty generic term.

fwiw, Gecko uses the term "audio segment".

sandersdan commented 3 years ago

I agree that Audio Frame is a term in the field meaning one sample for each channel at a point in time, and this is different from a WebCodecs AudioFrame which has a many samples for each channel over a duration of time.

Early in development we tried out a number of different names, but ultimately AudioFrame won out by symmetry with VideoFrame. It was confusing to use different terminology and the other names were even less meaningful.

That said the original discussions were more focussed on video so there may well be a bias, and some core aspects of WebCodecs have also changed in the meantime.

I'd be interested to hear if anyone has been confused by this name in the context of WebCodecs.

chcunningham commented 3 years ago

I agree with all points made above (from both authors), but I expect it is not actually confusing anyone in practice (happy to learn otherwise). In our back in forth on this internally, it was pointed out that ffmpeg has forever used the name AVFrame for both audio and video, so this seemed like a pretty safe bet.

The points about symmetry resonate for me. I quite like that both our encoded types are "chunks" and both our raw types are "frames". I think it makes the API more intuitive.

bradisbell commented 3 years ago

meaning in the Web Codecs spec, which is a group of continuous audio frames that has a duration

Agreed completely with @padenot. While I haven't used this API yet, I've been skimming the updates, and had no idea that AudioFrame in this context meant something other than a single sample per-channel.

I expect AudioFrame to mean exactly as @padenot says, "the ensemble of single audio samples that happen at a specific time, across multiple audio channels." If I'm referring to encoded audio, I would call it something like MPEGFrame, where that would refer to the encoded audio data representing something like 576 samples/PCM frames.

I would not expect AudioFrame to represent anything but a singular PCM frame.

chcunningham commented 3 years ago

Discussed a bit on the editors call, so far we don't have an alternative name we like better. Waiting a week for inspiration.

padenot commented 3 years ago

I don't have a better idea than AudioData, that has the benefit of being unambiguous. I think it's better than to have symmetry.

sandersdan commented 3 years ago

FYI I have mild apprehension about this because ImageData is a thing, and it's very unlike VideoFrame. It may be similar to what we want AudioFrame to be like though.

chcunningham commented 3 years ago

I understood Paul's last point (I think it's better than to have symmetry), to mean he considered AudioData, but preferred AudioFrame. I'm happy with that. Lets close. Please re-open if I've misunderstood.

padenot commented 3 years ago

I meant the opposite. Symmetry is a nice to have kind of thing, when possible. Introducing a new object with a name that already means something else for anybody who has ever touched PCM audio in computers really is problematic, and we can't do that.

It's even more misleading: a video frame precisely corresponds to an audio frame (=the real industry term), in that it represents a single unit of information sampled at a point in time, with a duration. It happens to be a few milliseconds in video, compared to a few microseconds for audio, because of the way humans work.

A quick search in your favourite search engine shows that ffmpeg is alone in that it calls a AVFrame what is an audio buffer with some metadata, and a sample what everybody calls a frame.

AudioPCMBuffer, PCMAudioBuffer, PCMBuffer, PCMData, PCMAudioData are all names that precisely convey what this structure is.

dalecurtis commented 3 years ago

FWIW ffmpeg uses frame (specifically AVFrame) for both audio and video, so it's not like AudioFrame has no precedent. [Edit: I see this mentioned above already]

chcunningham commented 3 years ago

@padenot sorry, I misread. AudioData works for me. I'll include this in the upcoming PR.