Closed sandersdan closed 3 years ago
MediaCodec requires us to supply HDR metadata up front, and provides output Textures.
Reading here, it sounds as if YUV is still available for Texture (Surface) output. Perhaps this would allow us to forgo providing the HDR metadata?
Summary of discussion today: HDR on Android is an unsolved problem; only tunneled mode is specified, and we can't use tunneled mode for WebCodecs. We do not know what capabilities may be exposed in the future.
As a user of a decoder I imagine I would be frustrated if I could not get unmodified output. We could make the behavior opt-in, then apps that need unmodified output would fail to create a codec.
Having access to the raw image and the associated color-space is much more powerful than any alternative. Can we have more info about why Android does it the way it does it? It sounds weird from a first look.
Sorry if I missed discussions on this, I've had to skip a couple meetings.
I believe Android did this as a shortcut and simplification; the entire decode-tonemap-render path is merged into one unit that can be tuned for a specific device, and existing media paths don't need to understand HDR.
I don't know whether the tonemapping part is actually implemented in an inseparable way on real devices, but certainly I imagine there will be a time in the future where the decoding part is always a separable component. Hopefully the API to use such a decoder will be clearly defined by then.
triage notes: marking 'extension' as the expected outcome is to simply add members to config dicts and attributes to VideoFrame describing color space.
This is fairly fundamental though, it doesn't seem possible to write a compatible implementation if how color spaces work is not specified.
This is fairly fundamental though, it doesn't seem possible to write a compatible implementation if how color spaces work is not specified.
The CanvasImageSource
integrations do not expose the frame's raw color data, so anything using those paths is compatible.
The proposed WEBGL_webcodecs_video_frame
extension does expose the raw color data, but includes generated shader code to do the conversion to the context's color space. It doesn't look like this sort of integration will be ready for WebCodecs v1 in any case.
The readInto()
API does expose raw color data. This still mostly works since software frames are usually in the color space specified by the container, but that fails if the decoder is doing conversions (which eg. Android MediaCodec does when binding to a texture). You basically end up with 'either what the container says or sRGB, hopefully they are similar'.
So far the demand for color space metadata from developers has been nonexistent, so while I'm uncomfortable with the gap it does seem like it can wait until v2 for many important use cases. (Probably because in SDR cases, assuming sRGB is 'good enough' for now.)
I don't actually think there is a huge amount of work to do here, my unpublished proposal is three enums properties on VideoFrame
and the codec configs: color primaries, transfer function, and YUV matrix. The main reason I didn't complete my proposal was that naming was a challenge (similar to EXIF rotation on images where there are plausible existing integer assignments, but it's unclear if those are the best way to expose the metadata).
@ccameron-chromium since whatever color spaces we expose should probably be compatible with what's going to be exposed by canvas color spaces.
The
CanvasImageSource
integrations do not expose the frame's raw color data, so anything using those paths is compatible.
It's a bit "magical" in the sense that it isn't clear that the frames themselves are tagged with a color space (colorspace information can be per frame in HDR iirc).
The proposed
WEBGL_webcodecs_video_frame
extension does expose the raw color data, but includes generated shader code to do the conversion to the context's color space. It doesn't look like this sort of integration will be ready for WebCodecs v1 in any case.
Do you have the link handy for this? I haven't found anything.
So far the demand for color space metadata from developers has been nonexistent, so while I'm uncomfortable with the gap it does seem like it can wait until v2 for many important use cases. (Probably because in SDR cases, assuming sRGB is 'good enough' for now.)
Shipping with only sRGB drawing or conversion is probably fine for v1, as long as it's defined somewhere what one needs to do and how it works.
I don't actually think there is a huge amount of work to do here, my unpublished proposal is three enums properties on
VideoFrame
and the codec configs: color primaries, transfer function, and YUV matrix. The main reason I didn't complete my proposal was that naming was a challenge (similar to EXIF rotation on images where there are plausible existing integer assignments, but it's unclear if those are the best way to expose the metadata).
Let me know if I can help moving this forward.
WEBGL_webcodecs_video_frame. Ongoing discussion is in https://github.com/gpuweb/gpuweb/issues/1380.
Let me know if I can help moving this forward.
I believe @chcunningham is working on a document, I'll let him take over.
Here's a summary of our internal discussion. The structure of each section is to (1) list default behavior (we don't think colorspace should be a required part of any config dictionary) to unblock us in the short term and (2) to sketch a path for the future.
Default VideoFrame behaviors (no color knobs):
In cases of decoding, capture, and canvas sources, we generally know the color space internally. We can convert to canvas color space (either 'srgb' or 'display-p3') upon rendering to canvas. When we later add color attributes to video frame, we can simply surface the internal knowledge.
For user constructed VideoFrame, our vote is to default colorspace to rec709. This is most common among modern HD content.
Later on, we can add knobs that allow user constructed VideoFrames to have a specified color space (settable only a construction) which should be considered as source space for conversion upon rendering to srgb/p3 canvas. We propose that the spec require support for all conversions. That is, if you can create a VideoFrame with colorspace X, your UA must know how to render that to canvas. WebCodecs should not be less powerful than <video>
-> canvas.
As mentioned above, decoders often know the colorspace for decoded frames from the bitstream. But if the bistream doesn't say, or the decoder impl doesn't tell us, our vote is to again default to rec709.
We could also say "default to rec709 always, and ignore what the stream may tell you". This has the benefit of being 100% interoperable... not sure yet which is the lesser evil.
Later, we can add the same color knobs from VideoFrame to VideoDecoderConfig to allow users the ability to customize this behavior.
As mentioned above, encoded bitstream may embed color space data. So what should our encoders do? We again vote to default to rec709.
Like above, we can later add the same color knobs to the encoder config and allow users to customize this behavior.
If this is agreeable, I would start by adding a definition of rec709, maybe to https://drafts.csswg.org/css-color/#predefined. Then I would just add a few lines of text to WC spec stating that is the assumed space.
rec709
makes sense as a default when things aren't known in today's world I think.
But there still is a need to expose the color space on VideoFrame
, so that people can handle them correctly when doing so manually, for example when reading back the planes, it is necessary to know what the numbers mean to do something with it. Not all Web Codec usage is going to be about painting with a Web API where things that are not exposed can be passed through. Without this, implementations can't be made compatible.
Should gl see the values in sRGB
(when the canvas is in sRGB
) when reading the pixel values in the fragment shader, and then we can add per-plane access? Painting to a canvas doesn't have this problem, since conversion happens when painting, so reading back has the correct value (depending on the canvas color space).
Straw man proposal:
partial interface VideoFrame {
readonly ColorSpace colorSpace;
readonly ColorRange colorRange;
// will need more for HDR
};
enum ColorSpace {
"rec2020"
"rec709",
"sRGB", // or we can separate gamma ?
// extendable
};
enum ColorRange {
"video-range",
"full-range"
};
I don't think we need to have all formats day 1, but this mean that there will be implicit conversion when reading back, because frames will have to be in a format that is usable by developers.
Editors call:
We could also say "default to rec709 always, and ignore what the stream may tell you". This has the benefit of being 100% interoperable... not sure yet which is the lesser evil.
@padenot still thinking, but leans toward "trust the decoder"
Straw man proposal:
This looks about right to me. The full set of properties that make logical sense is Color Primaries, Transfer Function, YUV Matrix, and Range, but most combinations are unlikely and some are nonsensical (eg. there are cases where Range is implied by the particular YUV Matrix). Limiting to a single list makes it easier for an implementation to support all of the options and to provide capabilities detection.
In my experience, unlikely combinations in media metadata are errors, and trusting them results in incorrect rendering, so I suspect that demand for complete control is low. The main question to answer here is whether WebCodecs should be prescribing which combinations are important enough.
ITU-T H.273 / ISO 23001-8 / ISO 23091-2 are substantially complete listings of values for these properties.
I put together a proposal: WebCodecs: Color Spaces
VideoDecoderConfig
.There are two proposed ways to represent the values:
I have a preference for strings, but H.273 is ubiquitous in media and therefore often convenient. We could use strings and provide a helper to convert. Strings also allow us to add color spaces not in H.273; I don't know if that will ever be important, but I also don't know if/when JPEG XL's XYB will show up there.
My proposed enum strings include distinct choices with the same meaning, for compatibility with H.273.
I've also highlighed the choices that are important in WebCodecs v1 (sRGB, BT.601 PAL, BT.601 NTSC, BT.709); this subset does not include any duplicates.
@svgeesus FYI about color proposal above.
@sandersdan @padenot Thanks for the pings, looking at it.
I have comments. Would you like them inline (I requested edit access) or here in this issue?
I have a preference for strings, but H.273 is ubiquitous in media and therefore often convenient.
H.273 is a great choice provided it covers all the options you need (which I think it does, here). Other media are also moving to this approach, see for example a proposal to Add H.273 metadata to PNG.
An example of relevant standards not supporting required colorspaces - content mastered in display-p3 or DCI P3, which is not supported by H.273 or by HDMI, so is transported in a Rec BT.2020 container. Which then needs additional "mastering volume" metadata to prevent a dumb gamut conversion from the entire 2020 volume.
rec709
makes sense as a default when things aren't known in today's world I think.
Related: the untagged video section of CSS Color 4: 4.5. Color Spaces of Untagged Colors which has different defaults depending on resolution (comments welcome)
I have comments. Would you like them inline (I requested edit access) or here in this issue?
Probably best to keep things here where there is a record of them, but I've granted you permission to add comments in the doc also.
H.273 is a great choice provided it covers all the options you need (which I think it does, here).
Note that this same metadata is intended to be used for ImageDecoder
also. Is the same statement true for image formats?
Which then needs additional "mastering volume" metadata to prevent a dumb gamut conversion from the entire 2020 volume.
Hmm, I've not seen this before. If content is going to be tagged like this do we need similar controls in WebCodecs?
Related: the untagged video section of CSS Color 4: 4.5. Color Spaces of Untagged Colors which has different defaults depending on resolution (comments welcome)
<video>
is similar in that SD is typically assumed to be BT.601 and HD is assumed to be BT.709. In practice I don't think the BT.601 default is correct very often, actual SD video is usually just downscaled HD content these days.
Alignment is more important than my personal opinion on this matter though. Would you recommend using the CSS approach for WebCodecs?
@svgeesus
Which then needs additional "mastering volume" metadata to prevent a dumb gamut conversion from the entire 2020 volume.
I understand now that this is equivalent to the mdcv
/clli
MP4 boxes. That version includes primaries and a whitepoint, is this always a well-known color space that could be simplified to an enum value or do content authors commonly tailor them?
It does look like we will need an equivalent mechanism for HDR support. (Not blocking for WebCodecs v1 but expected in v2.)
Absent other comments, I propose that we move forward with the string enum version of the proposal, as it is the less risky option.
If that goes through, I will also propose an H.273 conversion utility in v2.
<video>
is similar in that SD is typically assumed to be BT.601 and HD is assumed to be BT.709. In practice I don't think the BT.601 default is correct very often, actual SD video is usually just downscaled HD content these days.
I agree, it is rare nowadays to shoot at anything less than full HD so the 709 default makes more sense there. Sounds like I should update that section of CSS Color 4 on untagged video defaults.
Absent other comments, I propose that we move forward with the string enum version of the proposal, as it is the less risky option.
Agreed.
Observations from implementing this proposal in Chrome:
VideoRect
, it's not valid to have an attribute with a dictionary type. The solution is the same, so copying from DOMRectReadOnly
I ended up with dictionary VideoColorSpaceInit
and interface VideoColorSpace
. They are designed such that a VideoColorSpace
is valid anywhere a VideoColorSpaceInit
is required.VideoColorSpace
interface.That last option (c) isn't quite as straightforward as might be expected. The RGBA and YUV to RGBA paths are well-trodden, but subsampled formats are not so easily available in an efficient way.
Edit: I've updated the proposal to remove the requirement for VideoEncoder
to do any conversions. It should ignore the color space of input frames, and use the configured color space for its output VideoDecoderConfig
. We can work on explicit conversion APIs as a v2 feature.
Edit: I've updated the proposal to remove the requirement for
VideoEncoder
to do any conversions. It should ignore the color space of input frames, and use the configured color space for its outputVideoDecoderConfig
. We can work on explicit conversion APIs as a v2 feature.
That sounds right to me. In general the current state of the proposal makes sense.
Awesome. I've started working on the PR for this now.
Editors call:
readonly attribute boolean? fullRange;
Seems sane, but re: possibility of enum: do we expect that future revisions to codec specs / h.273 will extend this (full, limited, less-limited,...)? If not, bool is fine.
In terms of enums definitions, referencing h.273 table entries sounds good.
Ideally decoded content would be in the same colorspace as the encoded content, and colorspace negotiation is "just" a metadata management problem. Android MediaCodec works differently:
Open questions: