Negotiating colorspaces

sandersdan commented 4 years ago

Ideally decoded content would be in the same colorspace as the encoded content, and colorspace negotiation is "just" a metadata management problem. Android MediaCodec works differently:

MediaCodec requires us to supply HDR metadata up front, and provides output Textures. To support a decoder like this, we would need to specify colorspace information up-front, and that means it has to be a per-codec-profile description.
Thus colorspace is a property of a codec as much as it is a property of a frame.
A decoded frame could be in a different colorspace from the encoded media.
There is no obvious way to handle transcoding generically.

Open questions:

Is there a workaround for MediaCodec? If so, can we assume that all future decoders would also allow unprocessed access to the decoded frames?
Should we be implementing colorspace conversions for encoding, or is this always something the app should do? (Related: should encoders handle scaling content or is this also an app concern?)

chcunningham commented 4 years ago

MediaCodec requires us to supply HDR metadata up front, and provides output Textures.

Reading here, it sounds as if YUV is still available for Texture (Surface) output. Perhaps this would allow us to forgo providing the HDR metadata?

sandersdan commented 4 years ago

Summary of discussion today: HDR on Android is an unsolved problem; only tunneled mode is specified, and we can't use tunneled mode for WebCodecs. We do not know what capabilities may be exposed in the future.

As a user of a decoder I imagine I would be frustrated if I could not get unmodified output. We could make the behavior opt-in, then apps that need unmodified output would fail to create a codec.

padenot commented 4 years ago

Having access to the raw image and the associated color-space is much more powerful than any alternative. Can we have more info about why Android does it the way it does it? It sounds weird from a first look.

Sorry if I missed discussions on this, I've had to skip a couple meetings.

sandersdan commented 3 years ago

I believe Android did this as a shortcut and simplification; the entire decode-tonemap-render path is merged into one unit that can be tuned for a specific device, and existing media paths don't need to understand HDR.

I don't know whether the tonemapping part is actually implemented in an inseparable way on real devices, but certainly I imagine there will be a time in the future where the decoding part is always a separable component. Hopefully the API to use such a decoder will be clearly defined by then.

chcunningham commented 3 years ago

triage notes: marking 'extension' as the expected outcome is to simply add members to config dicts and attributes to VideoFrame describing color space.

padenot commented 3 years ago

This is fairly fundamental though, it doesn't seem possible to write a compatible implementation if how color spaces work is not specified.

sandersdan commented 3 years ago

This is fairly fundamental though, it doesn't seem possible to write a compatible implementation if how color spaces work is not specified.

The CanvasImageSource integrations do not expose the frame's raw color data, so anything using those paths is compatible.

The proposed WEBGL_webcodecs_video_frame extension does expose the raw color data, but includes generated shader code to do the conversion to the context's color space. It doesn't look like this sort of integration will be ready for WebCodecs v1 in any case.

The readInto() API does expose raw color data. This still mostly works since software frames are usually in the color space specified by the container, but that fails if the decoder is doing conversions (which eg. Android MediaCodec does when binding to a texture). You basically end up with 'either what the container says or sRGB, hopefully they are similar'.

So far the demand for color space metadata from developers has been nonexistent, so while I'm uncomfortable with the gap it does seem like it can wait until v2 for many important use cases. (Probably because in SDR cases, assuming sRGB is 'good enough' for now.)

I don't actually think there is a huge amount of work to do here, my unpublished proposal is three enums properties on VideoFrame and the codec configs: color primaries, transfer function, and YUV matrix. The main reason I didn't complete my proposal was that naming was a challenge (similar to EXIF rotation on images where there are plausible existing integer assignments, but it's unclear if those are the best way to expose the metadata).

dalecurtis commented 3 years ago

@ccameron-chromium since whatever color spaces we expose should probably be compatible with what's going to be exposed by canvas color spaces.

padenot commented 3 years ago

The CanvasImageSource integrations do not expose the frame's raw color data, so anything using those paths is compatible.

It's a bit "magical" in the sense that it isn't clear that the frames themselves are tagged with a color space (colorspace information can be per frame in HDR iirc).

The proposed WEBGL_webcodecs_video_frame extension does expose the raw color data, but includes generated shader code to do the conversion to the context's color space. It doesn't look like this sort of integration will be ready for WebCodecs v1 in any case.

Do you have the link handy for this? I haven't found anything.

So far the demand for color space metadata from developers has been nonexistent, so while I'm uncomfortable with the gap it does seem like it can wait until v2 for many important use cases. (Probably because in SDR cases, assuming sRGB is 'good enough' for now.)

Shipping with only sRGB drawing or conversion is probably fine for v1, as long as it's defined somewhere what one needs to do and how it works.

I don't actually think there is a huge amount of work to do here, my unpublished proposal is three enums properties on VideoFrame and the codec configs: color primaries, transfer function, and YUV matrix. The main reason I didn't complete my proposal was that naming was a challenge (similar to EXIF rotation on images where there are plausible existing integer assignments, but it's unclear if those are the best way to expose the metadata).

Let me know if I can help moving this forward.

sandersdan commented 3 years ago

WEBGL_webcodecs_video_frame. Ongoing discussion is in https://github.com/gpuweb/gpuweb/issues/1380.

Let me know if I can help moving this forward.

I believe @chcunningham is working on a document, I'll let him take over.

chcunningham commented 3 years ago

Here's a summary of our internal discussion. The structure of each section is to (1) list default behavior (we don't think colorspace should be a required part of any config dictionary) to unblock us in the short term and (2) to sketch a path for the future.

Canvas

Default VideoFrame behaviors (no color knobs):

In cases of decoding, capture, and canvas sources, we generally know the color space internally. We can convert to canvas color space (either 'srgb' or 'display-p3') upon rendering to canvas. When we later add color attributes to video frame, we can simply surface the internal knowledge.
For user constructed VideoFrame, our vote is to default colorspace to rec709. This is most common among modern HD content.

Later on, we can add knobs that allow user constructed VideoFrames to have a specified color space (settable only a construction) which should be considered as source space for conversion upon rendering to srgb/p3 canvas. We propose that the spec require support for all conversions. That is, if you can create a VideoFrame with colorspace X, your UA must know how to render that to canvas. WebCodecs should not be less powerful than <video> -> canvas.

VideoDecoder

As mentioned above, decoders often know the colorspace for decoded frames from the bitstream. But if the bistream doesn't say, or the decoder impl doesn't tell us, our vote is to again default to rec709.

We could also say "default to rec709 always, and ignore what the stream may tell you". This has the benefit of being 100% interoperable... not sure yet which is the lesser evil.

Later, we can add the same color knobs from VideoFrame to VideoDecoderConfig to allow users the ability to customize this behavior.

VideoEncoder

As mentioned above, encoded bitstream may embed color space data. So what should our encoders do? We again vote to default to rec709.

Like above, we can later add the same color knobs to the encoder config and allow users to customize this behavior.

Spec work

If this is agreeable, I would start by adding a definition of rec709, maybe to https://drafts.csswg.org/css-color/#predefined. Then I would just add a few lines of text to WC spec stating that is the assumed space.

padenot commented 3 years ago

rec709 makes sense as a default when things aren't known in today's world I think.

But there still is a need to expose the color space on VideoFrame, so that people can handle them correctly when doing so manually, for example when reading back the planes, it is necessary to know what the numbers mean to do something with it. Not all Web Codec usage is going to be about painting with a Web API where things that are not exposed can be passed through. Without this, implementations can't be made compatible.

Should gl see the values in sRGB (when the canvas is in sRGB) when reading the pixel values in the fragment shader, and then we can add per-plane access? Painting to a canvas doesn't have this problem, since conversion happens when painting, so reading back has the correct value (depending on the canvas color space).

Straw man proposal:

partial interface VideoFrame {
  readonly ColorSpace colorSpace;
  readonly ColorRange colorRange;
  // will need more for HDR
};

enum ColorSpace {
  "rec2020"
  "rec709",
  "sRGB", // or we can separate gamma ?
  // extendable
};

enum ColorRange {
  "video-range",
  "full-range"
};

I don't think we need to have all formats day 1, but this mean that there will be implicit conversion when reading back, because frames will have to be in a format that is usable by developers.

chcunningham commented 3 years ago

Editors call:

We could also say "default to rec709 always, and ignore what the stream may tell you". This has the benefit of being 100% interoperable... not sure yet which is the lesser evil.

@padenot still thinking, but leans toward "trust the decoder"

sandersdan commented 3 years ago

Straw man proposal:

This looks about right to me. The full set of properties that make logical sense is Color Primaries, Transfer Function, YUV Matrix, and Range, but most combinations are unlikely and some are nonsensical (eg. there are cases where Range is implied by the particular YUV Matrix). Limiting to a single list makes it easier for an implementation to support all of the options and to provide capabilities detection.

In my experience, unlikely combinations in media metadata are errors, and trusting them results in incorrect rendering, so I suspect that demand for complete control is low. The main question to answer here is whether WebCodecs should be prescribing which combinations are important enough.

ITU-T H.273 / ISO 23001-8 / ISO 23091-2 are substantially complete listings of values for these properties.

sandersdan commented 3 years ago

I put together a proposal: WebCodecs: Color Spaces

Key points:

Encodes color primaries, transfer characteristics, matrix coefficients, and color range as separate properties.
Expects app-provided configurations to override in-band metadata.
UAs may do color conversions internally, a decoded frame's color space is not required to match the VideoDecoderConfig.
Requires conversions when rendering and encoding, expects UAs to reject unsupported color spaces at codec config and frame construction time.
Defaults to sRGB for RGB frames and BT.709 for YUV frames. UAs may be required to support these if they support those pixel formats.

Open questions

Representation

There are two proposed ways to represent the values:

Using enums (strings).
Using H.273 values (octets).

I have a preference for strings, but H.273 is ubiquitous in media and therefore often convenient. We could use strings and provide a helper to convert. Strings also allow us to add color spaces not in H.273; I don't know if that will ever be important, but I also don't know if/when JPEG XL's XYB will show up there.

Duplicate Values

My proposed enum strings include distinct choices with the same meaning, for compatibility with H.273.

I've also highlighed the choices that are important in WebCodecs v1 (sRGB, BT.601 PAL, BT.601 NTSC, BT.709); this subset does not include any duplicates.

sandersdan commented 3 years ago

@svgeesus FYI about color proposal above.

svgeesus commented 3 years ago

@sandersdan @padenot Thanks for the pings, looking at it.

svgeesus commented 3 years ago

I have comments. Would you like them inline (I requested edit access) or here in this issue?

svgeesus commented 3 years ago

I have a preference for strings, but H.273 is ubiquitous in media and therefore often convenient.

H.273 is a great choice provided it covers all the options you need (which I think it does, here). Other media are also moving to this approach, see for example a proposal to Add H.273 metadata to PNG.

svgeesus commented 3 years ago

An example of relevant standards not supporting required colorspaces - content mastered in display-p3 or DCI P3, which is not supported by H.273 or by HDMI, so is transported in a Rec BT.2020 container. Which then needs additional "mastering volume" metadata to prevent a dumb gamut conversion from the entire 2020 volume.

svgeesus commented 3 years ago

rec709 makes sense as a default when things aren't known in today's world I think.

Related: the untagged video section of CSS Color 4: 4.5. Color Spaces of Untagged Colors which has different defaults depending on resolution (comments welcome)

sandersdan commented 3 years ago

I have comments. Would you like them inline (I requested edit access) or here in this issue?

Probably best to keep things here where there is a record of them, but I've granted you permission to add comments in the doc also.

H.273 is a great choice provided it covers all the options you need (which I think it does, here).

Note that this same metadata is intended to be used for ImageDecoder also. Is the same statement true for image formats?

Which then needs additional "mastering volume" metadata to prevent a dumb gamut conversion from the entire 2020 volume.

Hmm, I've not seen this before. If content is going to be tagged like this do we need similar controls in WebCodecs?

Related: the untagged video section of CSS Color 4: 4.5. Color Spaces of Untagged Colors which has different defaults depending on resolution (comments welcome)

<video> is similar in that SD is typically assumed to be BT.601 and HD is assumed to be BT.709. In practice I don't think the BT.601 default is correct very often, actual SD video is usually just downscaled HD content these days.

Alignment is more important than my personal opinion on this matter though. Would you recommend using the CSS approach for WebCodecs?

sandersdan commented 3 years ago

@svgeesus

Which then needs additional "mastering volume" metadata to prevent a dumb gamut conversion from the entire 2020 volume.

I understand now that this is equivalent to the mdcv/clli MP4 boxes. That version includes primaries and a whitepoint, is this always a well-known color space that could be simplified to an enum value or do content authors commonly tailor them?

It does look like we will need an equivalent mechanism for HDR support. (Not blocking for WebCodecs v1 but expected in v2.)

sandersdan commented 3 years ago

Absent other comments, I propose that we move forward with the string enum version of the proposal, as it is the less risky option.

If that goes through, I will also propose an H.273 conversion utility in v2.

svgeesus commented 3 years ago

<video> is similar in that SD is typically assumed to be BT.601 and HD is assumed to be BT.709. In practice I don't think the BT.601 default is correct very often, actual SD video is usually just downscaled HD content these days.

I agree, it is rare nowadays to shoot at anything less than full HD so the 709 default makes more sense there. Sounds like I should update that section of CSS Color 4 on untagged video defaults.

padenot commented 3 years ago

Absent other comments, I propose that we move forward with the string enum version of the proposal, as it is the less risky option.

Agreed.

sandersdan commented 3 years ago

Observations from implementing this proposal in Chrome:

As we discovered with VideoRect, it's not valid to have an attribute with a dictionary type. The solution is the same, so copying from DOMRectReadOnly I ended up with dictionary VideoColorSpaceInit and interface VideoColorSpace. They are designed such that a VideoColorSpace is valid anywhere a VideoColorSpaceInit is required.
Future compatibility with H.273 is simplified since we can put conversions directly on the VideoColorSpace interface.
I included only a minimal set of enum values, which was mildly confusing. I think this supports my original proposal of including the duplicate/alias values.
Arbitrary conversions for encode is quite difficult. I'm undecided on the choices of (a) rejecting most color spaces for encode, (b) removing the conversion step entirely, or (c) properly implementing arbitrary color space conversions in Chrome.

That last option (c) isn't quite as straightforward as might be expected. The RGBA and YUV to RGBA paths are well-trodden, but subsampled formats are not so easily available in an efficient way.

Edit: I've updated the proposal to remove the requirement for VideoEncoder to do any conversions. It should ignore the color space of input frames, and use the configured color space for its output VideoDecoderConfig. We can work on explicit conversion APIs as a v2 feature.

padenot commented 3 years ago

Edit: I've updated the proposal to remove the requirement for VideoEncoder to do any conversions. It should ignore the color space of input frames, and use the configured color space for its output VideoDecoderConfig. We can work on explicit conversion APIs as a v2 feature.

That sounds right to me. In general the current state of the proposal makes sense.

chcunningham commented 3 years ago

Awesome. I've started working on the PR for this now.

chcunningham commented 3 years ago

Editors call:
readonly attribute boolean? fullRange;

Seems sane, but re: possibility of enum: do we expect that future revisions to codec specs / h.273 will extend this (full, limited, less-limited,...)? If not, bool is fine.

In terms of enums definitions, referencing h.273 table entries sounds good.

w3c / webcodecs