How to handle varying pixel formats

youennf commented 2 years ago

A transform exposes video frames that can be of various pixel formats (https://w3c.github.io/webcodecs/#enumdef-videopixelformat). Depending on the OS and/or camera, this might be I420 or NV12 right now for cameras. This might probably be the same for video element exported tracks, RGBA might be used for canvas capture tracks maybe.

It seems this can lead us to interop issue, especially for camera tracks, where applications will expect a given format and will break whenever their assumption is wrong. I see a few options:

Let the web app deal with it: they can implement their own conversion in JS (computationally expensive though)
Let the web app easily convert video frames to another format.
Let the web app prescribe the pixel format it wants as API when creating the transform.
Let the UAs consistently select pixel formats (specs recommend or require to use a particular pixel formats, on a source type maybe)

youennf commented 2 years ago

Ditto for other characteristics such as color space (fullRange or not fullRange e.g.).

dontcallmedom commented 2 years ago

this was also discussed a bit under https://github.com/webmachinelearning/webnn/issues/226#issuecomment-1031518141

tidoust commented 1 year ago

Regarding conversion by the web app, a relatively easy and efficient way of converting to RGBA is through WebGPU (well, "relatively easy" provided you're familiar with a few WebGPU concepts, and "efficient" when the underlying data of the VideoFrame is on the GPU). The external texture sampler returns pixels in RGBA (or BGRA) with the specified color space, regardless of the pixel format of the external texture.

Here is a code example of a transformer function that converts a VideoFrame to RGBA with WebGPU. Most lines are "boilerplate" code to use WebGPU. The actual conversion is done by the fragment shader and does not require specific knowledge of pixel formats and conversion formulas:

fn frag_main(@location(0) uv : vec2<f32>) -> @location(0) vec4<f32> {
  return textureSampleBaseClampToEdge(myTexture, mySampler, uv);
}

w3c / mediacapture-transform

How to handle varying pixel formats #83