w3c / mediacapture-transform

MediaStreamTrack Insertable Media Processing using Streams
https://w3c.github.io/mediacapture-transform/
Other
44 stars 19 forks source link

How to handle varying pixel formats #83

Open youennf opened 2 years ago

youennf commented 2 years ago

A transform exposes video frames that can be of various pixel formats (https://w3c.github.io/webcodecs/#enumdef-videopixelformat). Depending on the OS and/or camera, this might be I420 or NV12 right now for cameras. This might probably be the same for video element exported tracks, RGBA might be used for canvas capture tracks maybe.

It seems this can lead us to interop issue, especially for camera tracks, where applications will expect a given format and will break whenever their assumption is wrong. I see a few options:

youennf commented 2 years ago

Ditto for other characteristics such as color space (fullRange or not fullRange e.g.).

dontcallmedom commented 2 years ago

this was also discussed a bit under https://github.com/webmachinelearning/webnn/issues/226#issuecomment-1031518141

tidoust commented 1 year ago

Regarding conversion by the web app, a relatively easy and efficient way of converting to RGBA is through WebGPU (well, "relatively easy" provided you're familiar with a few WebGPU concepts, and "efficient" when the underlying data of the VideoFrame is on the GPU). The external texture sampler returns pixels in RGBA (or BGRA) with the specified color space, regardless of the pixel format of the external texture.

Here is a code example of a transformer function that converts a VideoFrame to RGBA with WebGPU. Most lines are "boilerplate" code to use WebGPU. The actual conversion is done by the fragment shader and does not require specific knowledge of pixel formats and conversion formulas:

fn frag_main(@location(0) uv : vec2<f32>) -> @location(0) vec4<f32> {
  return textureSampleBaseClampToEdge(myTexture, mySampler, uv);
}