w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
979 stars 136 forks source link

Add definitions / diagrams for subtle video frame concepts #166

Open chcunningham opened 3 years ago

chcunningham commented 3 years ago

Including coded size, crop size, display size, stride...

chcunningham commented 3 years ago

Triage note: marking 'editorial', as there is broad consensus on what these terms / concepts mean, this issue merely tracks the need to document that better.

padenot commented 3 years ago

I agree, from a high-level POV, but this needs to be precisely specified, because we all know small details matter when shipping APIs on the Web.

That said, it's not hard to spec I assume, granted we use the commonly accepted definitions, an attempt below:

The coded width is the size of a horizontal pixel line, that is bigger or equal to the stride. The coded height is the size of a vertical pixel line, that is bigger or equal to the height of the picture (but is that useful ?). The coded height x the coded width allows knowing how much memory an image occupies in practice. Then this memory interpreted by the metadata found on VideoFrame: a crop rectangle is applied (trimming off what's outside the cropped region), and then scaling is applied via displayWidth and displayHeight. Scaling is defined as being the linear transform we all know.

It can well be that the type of scaling needs to be specified, maybe by using https://html.spec.whatwg.org/#resizequality ?

chcunningham commented 3 years ago

Thanks @padenot. I think those definitions match my expectation. @sandersdan to double check.

One small nit

The coded width is the size of a horizontal pixel line, that is bigger or equal to the stride.

I think this should be <= to the stride.

Re: display scaling, I defer to @sandersdan.

sandersdan commented 3 years ago

The coded width is the size of a horizontal pixel line, that is bigger or equal to the stride.

There is no inherent relationship because codedWidth is measured in samples and stride is measured in bytes. codedWidth is also a measurement of the whole frame while stride is per-plane.

What is always true is that planeCodedWidth * planeSampleBytes <= planeStride.

There is still some discussion happening re: display size in #94. The current proposal is to treat the display size as a display aspect ratio, and then apply the usual 'scale up in exactly one dimension to match' approach that <video> uses. To be clear, I expect the attributes on VideoFrame to be the actual display dimensions, but they may be scaled compared to the VideoDecoderConfig/VideoFrameInit.

(Perhaps in the future we would add a pixelWidth/pixelHeight for cases that prefer using a pixel aspect ratio.)

padenot commented 3 years ago

We can only work on this after https://github.com/w3c/webcodecs/issues/165 then. Thanks for the comment outlining my mistake. I'll work on a PR to add this precisely with all the comments taken into account.