w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
981 stars 136 forks source link

Prehistoric codecs/H.263+ entry #416

Closed Yahweasel closed 6 months ago

Yahweasel commented 2 years ago

Short version: None of the video codecs in the registry are very approachable with a WebAssembly-compiled software-only encoder. Can we get something a bit more prehistoric, such as H.263+?

Long version:

Let me start with a not-hypothetical: If you're strange (bold?) enough to want to build a telephony app where the WebCodecs implementation might be a WebAssembly-compiled polyfill, what codecs should you use? For audio, it's simple: Opus. Opus is so blazingly fast there's essentially no reason to use anything but a polyfill. For video, the answer is a bit murky.

libvpx, bless its heart, can encode in real time, even when compiled to WebAssembly. But, its real-time mode is essentially "do as much as you can in the time budget, then give up". So, if the time budget is very limiting, the encoding looks awful. Real RealMedia vibes. H.264-era and later codecs are just too sophisticated to expect good quality with real-time encoding in WebAssembly, at least for the time being.

Would it be reasonable to add an earlier video codec to the registry? My proposal/suggestion is H.263v2/H.263+. Admittedly, the selection is purely self-serving—I can easily encode and decode H.263+ in real time in WebAssembly—but I think it's also a reasonable choice.

Some considerations:

dalecurtis commented 2 years ago

There's no limitation on what codecs can be put in the registry since all codecs are optional to implementers. Please feel free to contribute a registry entry for H.263.

Yahweasel commented 2 years ago

Basically, I'm a bit confused/concerned by CONTRIBUTING.md. As I'm not a member of the working group, I must "make a non-member patent licensing commitment", which seems a bit onerous given that I have no patents and don't intend to contribute anything that's under patent...

dalecurtis commented 2 years ago

Hmm, I'm not sure about all that. @tidoust

Yahweasel commented 2 years ago

There's no limitation on what codecs can be put in the registry since all codecs are optional to implementers. Please feel free to contribute a registry entry for H.263.

Also, to be incredibly pedantic, the spec does place a (tiny) implementation burden even for unimplemented codecs: the isConfigSupported method is supposed to throw an exception if the codec string is invalid (whether malformed or simply not in the registry), but simply return an object with supported: false if the codec is merely unsupported. To be honest, I find this a bizarre design decision, but regardless, it does have the consequence that adding a codec to the codec registry isn't irrelevant to other implementations.

sandersdan commented 2 years ago

Hmm, there is room to improve the spec text in this regard. The intent is that an implementation is free to throw for codecs they do not support at all (I'd expect a TypeError for an unknown enum value), and this follows from the registry being non-normative.

Implementations are expected to return supported: false if they can parse the enum value and could otherwise support the codec except that some particular configuration setting prevents that. FWIW, Chrome's implementation doesn't yet perfectly distinguish these cases and in a few situations can throw for valid codec strings.

Yahweasel commented 2 years ago

FWIW, Chrome's implementation doesn't yet perfectly distinguish these cases and in a few situations can throw for valid codec strings.

I know, I submitted a bug about throwing on ulaw and alaw from AudioEncoder ;)

Yahweasel commented 2 years ago

Hmm, there is room to improve the spec text in this regard. The intent is that an implementation is free to throw for codecs they do not support at all (I'd expect a TypeError for an unknown enum value), and this follows from the registry being non-normative.

Ahhh, OK, that does follow logically, but it fooled me. Since “A compliant implementation MAY support any combination of codec registrations or none at all.”, a given codec string may not be valid for me, even though it's valid for you, or vice-versa. Subtle.

tidoust commented 2 years ago

Basically, I'm a bit confused/concerned by CONTRIBUTING.md. As I'm not a member of the working group, I must "make a non-member patent licensing commitment", which seems a bit onerous given that I have no patents and don't intend to contribute anything that's under patent...

In practice, it all depends on which document you target with your contributions and what your contributions are going to be.

The CONTRIBUTING.md file is a bit too strong in that it was written for a repo that only has one normative specification (in W3C parlance, a spec on the Recommendation track), which is how this repo started. Typically, if you make a normative contribution to the WebCodecs spec, we will ask you to make a non-member patent licensing commitment. That is a protective measure to get confidence that the final spec won't include IP encumbered features that are not covered by licensing commitments.

This repo now also contains the registry and the registrations, which are non-normative documents (in W3C parlance, they are not on the Recommendation track). Contributions to these documents do not require you to sign anything. Similarly, contributions to the WebCodecs spec that are purely editorial (non-normative) are also possible without signing anything.

Yahweasel commented 2 years ago

Thanks for the clarification! I'll make a PR then.

chrisn commented 2 years ago

Sorry for the delay, @Yahweasel. We're currently seeking input to help review the PR.

One question came up in our discussion: RFC 4629 describes the features added in H.263+ and H.263++ over H.263. The WebCodecs API does not support some of these enhanced capabilities, such as reference picture selection and SNR scalability. Would your implementation support configuration of features such as slice structured mode or independent segment decoding (ISD)? We'd like to understand what H.263 version you're targeting (in https://github.com/w3c/webcodecs/issues/416#issue-1068585976 you mention only H.263+), and which of the feature set you would be looking to implement?

Yahweasel commented 2 years ago

My polyfill is libavjs-webcodecs-polyfill, and it (sensibly for the name) uses libav.js, which in turn is a port of the libav* libraries from FFmpeg to WebAssembly. So, what I intend to implement is exactly what libav.js implements :). FFmpeg does not support H.263++/H.263v3 at all, only closely related codecs such as MPEG-4 Part 2 and Sorenson Spark. As far as I can tell, there's no way to munge it into perfect H.263++ compatibility.

My purpose for any of this is live chat, and FFmpeg's MPEG-4 Part 2 implementation really isn't geared towards that, while its H.263* codecs are, hence H.263 at all. Its h263p encoder is fast enough for real time even in software, in WebAssembly (though ironically I'm currently having the problem that capturing a frame in, e.g., Firefox takes some 20ms, blowing the budget on capture instead of encoding 🤪)

FFmpeg has two H.263 codecs: h263 and h263p. Their documentation isn't great, because (a) it's hardly the most popular codec in the suite, and (b) the implementations of H.263, H.263v2, MPEG-4 Part 2, Sorenson Spark, and Microsoft MPEG-4 variants are all in the same file with conditions. h263 implements H.263v1, and h263p implements H.263v2. I believe that the following statement is true: h263 implements the entirety of H.263v1, and h263p does not use every annex for encoding, but supports for decoding every annex that does not require external metadata. Therefore, standardizing the condition that a compliant decoder must support anything you can throw at it but a compliant encoder may use any subset of annexes fits FFmpeg's behavior.

Yahweasel commented 6 months ago

(Closing as this doesn't seem necessary and is clearly going nowhere)