w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
975 stars 137 forks source link

https://w3c.github.io/webcodecs/#videoframe-compute-layout-and-allocation-size is not taking into account RGB formats #573

Open youennf opened 1 year ago

youennf commented 1 year ago

https://w3c.github.io/webcodecs/#videoframe-compute-layout-and-allocation-size algorithm is computing the allocation size based on computedLayout’s sourceWidthBytes.

Computation of sourceWidthBytes is based on sample size and rect width. In the case of RGB formats, the sample size is 1, but each pixel has 4 samples. I would assume sourceWidthBytes to be multiple by the per-pixel sample count. I do not see where this is accounted for in the spec currently.

dalecurtis commented 1 year ago

I defer to @sandersdan, but I think that's opposite of our interpretation. I.e., In a single plane RGB format, each sample is exactly one packed RGBx. We could certainly clarify this in the text though.

youennf commented 1 year ago

https://w3c.github.io/webcodecs/#dom-videopixelformat-rgba says:

Each sample in this format is 8 bits, and each pixel is therefore 32 bits. There are codedWidth codedHeight 4 samples (and therefore bytes) in the single plane

dalecurtis commented 1 year ago

Ahh, I thought Dan wrote that, but it looks like @padenot added it. https://github.com/w3c/webcodecs/pull/288

youennf commented 1 year ago

In https://w3c.github.io/webcodecs/#videoframe-compute-layout-and-allocation-size, we compute sampleWidthBytes as the multiplication of sampleBytes and sampleWidth (step 5). With #579, this would be equal to 4 for RGBA.

We then compute sourceLeftBytes and sourceWidthBytes (steps 9 and 10) as the division of rect.x by sampleWidthBytes. Ditto for sourceWidthBytes with rect.width. This works well for NV12 where this allows to handle the subsampling factor for the UV plane. This does not seem correct to divide by 4 for RGBA though.

padenot commented 1 year ago

It kind of works well for NV12 by chance, because for the U and V planes, it's a subsampling of 2 and a per-pixel-sample-count of 2 (if there an industry-standard word to refer to this concept, it elludes me right now).

If we make this concept of a per-pixel-sample-count explicit and adjust the prose a bit, it should work (in addition to fixing the RGBA prose, that currently states that an RGBA sample is 8 bits, which is incorrect, i.e., #579).

I think I'll also make all the RGBA permutations reference a single one (say, RGBA), and simply state that either a component is unused, or that the components are shuffled compared to the reference ordering.