API for codec performance

Copied from Issue https://github.com/w3c/webrtc-pc/issues/2241

Hi Group,

I work on the MediaCapabilities (MC) spec (explainer). I've had a few requests from WebRTC apps to expand this API to describe WebRTC encode/decode performance. The use cases make sense but I have some ergonomics concerns and I'd like to collab w/ RTC experts here to explore the options.

MediaCapabilities today

The primary interface is decodingInfo(). This was designed to replace \<video>.canPlayType(...). It describes "file" (foo.mp4) and "media-source" (YouTube, Netflix, ...) decoding types. It does not include WebRTC. This is implemented in Chrome, Firefox, and Safari.
The spec also defines encodingInfo(), which covers "recording" (MediaRecorder) and "transmission" (WebRTC) encoding types. This part of the spec is less mature and not shipped by any browser. The "recording" part is conceptually a simple reversal of file-decoding. The "transmission" (WebRTC) type seemed like a natural next step (MediaRecorder after all is ~part of WebRTC family). This interface is very roughly spec'ed; it needs some love (or maybe removal).

WebRTC use cases are very similar to those from the media playback world. Apps would like to know before-hand what limits to set for resolution/framerate/bitrate such that the machine is able to maintain a buttery smooth (timely) user encoding/decoding experience (ignoring the small matter of network issues). WebRTC's reference implementation helps out by automatically adapting encode resolution if the CPU is over-used, but this requires the user to first have a bad experience before adapting back down (meanwhile, the camera may be left open at HD resolution, potentially wasting resources of a starving machine).

Apps may also ask what codecs can be hardware accelerated to minimize battery drain.

Ergonomics: What shape should the API take? Where should it live? We took a closer look at implementing MC.encodingInfo() for "transmission" (WebRTC) encoding in Chrome, and a few things gave me pause.

Most notably, WebRTC has existing capability APIs that seem like a natural place to describe codec performance.

The RTCRtpReceiver and RTCRtpSender inferfaces both define a getCapabilities() method, that returns a sequence of RTCRtpCodecCapabilities. Right away we see that RTC users can ask about capabilities on both ends of the wire, not just the local machine. This is something MediaCapabilities can't do.
RTCRtpCodecCapability does not include performance info (e.g. max framerate, max resolution). But the ORTC spec proposes new "codec capability parameters" for this object, including codec specific fields like "max-fs" (frame size) and "max-fr" (frame rate). Should this serve as a model to define similar in WebRTC?

Another issue we face is that WebRTC doesn't use the same mime-type codec strings from the \<video> playback world (e.g. 'video/mp4; codecs="avc1.4d001e"' for \<video> vs 'video/h264' in RTC). Its a little ugly to consider mixing these into MediaCapabilities and it smells like a hint that we're merging two worlds that might better be kept separate.

Interested to get your thoughts! Thanks for reading.

w3c / webrtc-nv-use-cases

API for codec performance #41