whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.05k stars 2.64k forks source link

Standardize rendering of PQ and HLG HDR image content #9112

Open ccameron-chromium opened 1 year ago

ccameron-chromium commented 1 year ago

Images can specify that they are HDR by indicating their use of the Hybrid-Log Gamma (HLG) or Perceptual Quantizer (PQ) transfer function.

There are other schemes for HDR images (e.g, using gainmaps), but those do not suffer the issues described here.

Background on HLG and PQ display specifications

The image specifies that it using the HLG or PQ transfer function using coding-independent code points (CICP) information described in ITU-T H.273. This can done in an ICC profile, or via other mechanisms.

The HLG and PQ transfer functions, and instructions for rendering them on reference displays in reference environments, may be found in ITU-R BT.2100.

For HLG, this specifies a mapping from pixel values to display luminance as a function of the maximum brightness of the display.

For PQ, this specifies a mapping from pixel values to display luminance, with a pixel value of 1.0 specifying a luminance of 10,000 nits.

In the event that the display cannot produce the display luminance specified by the pixels, tone mapping may be needed. There exist several tone mapping algorithms. ITU-R BT.2408 is one such option. SMPTE ST 2094 is another. A simple rational function is currently used in Chromium. See this notebook that compares the three.

For PQ, when performing this tone mapping, additional metadata parameters may be used to guide the mapping. The specification SMPTE ST 2086 specifies such parameters. The Maximum Display Mastering Luminance (MDML) from this specification is often used as the maximum input luminance for tone mapping. Much PQ content has a MDML of 1,000 nits.

Some indications of how to relate HDR and SDR content is present in ITU-R BT.2408-5, Section 5: Inclusion of SDR content. This recommends that, when transcoding SDR content to PQ, the maximum SDR luminance should be mapped to 203 nits. This corresponds to a pixel value of 0.75 in HLG, when HLG is displayed on a reference display with a maximum brightness of 1,000 nits, in a reference environment.

Background on HDR pixel representations on operating systems

Displays on phone/tablet/laptop/desktop devices (hereafter referred to as just desktop) have a concept of the SDR brightness. This is the brightness of the color #FFFFFF. Some (but not all) operating systems allow querying this brightness (in nits).

HDR desktop displays are capable of displaying a brightness brighter than this. They are usually "holding back" potential brightness because it is not needed in current ambient viewing conditions. (As a rule of thumb, the display's SDR brightness is often approximately the brightness of a white sheet of paper).

All operating systems that support HDR displays allow querying the HDR headroom, which is the ratio of the brightest color that can be displayed to the SDR brightness. Note that the HDR headroom changes over time (e.g, it is larger when a display is in a dark environment and the SDR brightness is lower, and smaller when in sunlight).

All operating systems that support HDR displays allow representing content in an extended-SDR format. The easiest representation to think of is a buffer in a color space like srgb-linear where a pixel value of 2,2,2 is 2x as bright as #FFFFFF. This buffer can display pixel values (in the display's linearized native primaries) all the way up the to HDR headroom, whereupon pixel values start to clamp. Not all operating systems expose this exact representation (for reasons of power efficiency), but all operating systems have an equivalent capability.

Critically, it should be noted that displays are not natively HLG or PQ. The only time that it is necessary to use HLG or PQ to display content is when the content must be sent over a medium that requires it (e.g, over an HDMI cable). This situation applies only to "external" displays (and only some at that) -- it does not apply to phones, tablets, laptops, or TV screens (when being accessed via the native operating system, not via an HDMI dongle).

The problem

There is not a clear way to relate the the above "HLG and PQ display specifications" with the above "HDR pixel representations on operating systems".

Most important is that almost all of the specifications indicate a reference viewing environment. This is a highly controlled and very dark environment (<5 nits in BT.2100). HDR content on the web will be displayed on extremely uncontrolled environments, often with very high ambient light. Consequently, a literal interpretation of almost any of the specifications will produce an unacceptable user experience (e.g, with content being far too dark).

Also important is that SDR brightness is not treated as an important independent variable for rendering. Similarly, HDR headroom is not a concept.

Assumption: All content is rendered independently

Before giving any proposed solution, it's important to set the following parameter for the solution: All content is rendered independently. Put another way: The rendering of content is not affected by the presence of other content on the screen.

For example, if I have a page with an SDR image, and then I replace the SDR image with an HDR image, the only thing that should change on the screen is the pixels of that image itself. If I were to cover the the image with my hand, I should not see any difference between when the HDR image is present or absent. This is critical to avoid regressing the experience of SDR content on the web.

Be aware that the presence of extremely bright HDR images can create the illusion that the SDR content on the screen has darkened. When in doubt, use the "occlude with your hand" test.

As another example, if a page has four images: An SDR image, an HLG image, a PQ image with MDML of 1,000 nits, and a PQ image with MDML of 10,000 nits, they are all tone mapped and displayed independently of each other. They are not transformed into any common space except for the space of the output device.

Proposal

The core component of the proposed solution is to parameterize all rendering of HLG and PQ content by the HDR headroom and no other display parameters. Note that this matches how gainmap based HDR images are displayed (they depend only on the HDR headroom and no other display parameters). A natural next step would also be to support specifying an HDR headroom for 2D canvases.

For PQ content, the proposal for rendering a pixel on a display is as follows:

  1. apply the PQ EOTF to convert from pixel values to nit values
  2. divide by 203 nits to get colors in a linear space with Rec2020 primaries where 1,1,1 is SDR white
  3. convert the color from the Rec2020 primaries to the output device's primaries
  4. apply a tonemap curve 4.1. let M be the maximum brightness of image in nits (10,000 if no metadata is specified, the MDML if that metadata is present) 4.2. let D be the HDR headroom of the output device 4.3. the curve is to map M/203 to D

For HLG content, the proposal for rendering a pixel is as follows

  1. transcode the pixel from HLG to PQ on a 1,000 nit reference display 1.1. apply the HLG inverse OETF 1.2. apply the HLG OOTF for maximum luminance 1,000 nits (gamma is 1.2) 1.3. scale by 1,000 / 203
  2. render using the same algorithm as PQ content, with M=1000

For the tonemap curve recommend the following rational function that maps the domain [0, maxInput] to the range [0, maxOutput].

  tonemap(color, maxInput, maxOutput) {
    if (maxInput <= maxOutput)
      return color;
    let a = maxOutput / (maxInput*maxInput);
    let b = 1 / maxOutput;
    let colorMax = max(color.r, color.g, color.b);
    return color * (1 + a*colorMax) / (1 + b*colorMax);
  }

This curve has the benefits that it is extremely simple and avoids hard cutoffs. It also produces an almost-sRGB curve for HLG content when the HDR headroom is 1 (that is, on an SDR display).

emilio commented 1 year ago

cc @kdashg @jrmuizel @mstange

ccameron-chromium commented 1 year ago

I've written up more explicit details in this document, and this notebook contains the code for the figures in that document.

past commented 1 year ago

@ccameron-chromium your document link above points to this issue by mistake, presumably?

ccameron-chromium commented 1 year ago

Updated the link to this document above.

ccameron-chromium commented 1 year ago

Another alternative scheme, which I think also works well, is to, every input format, define:

For displaying on SDR, use SDR. For displaying on a display with HDR headroom at or above the "maximum fidelity", just use the full HDR rendition. For anything in between, use the standard gainmap image math (see Adobe's spec, or this notebook for more details).

ccameron-chromium commented 1 year ago

The current behavior across browsers can be viewed at https://ccameron-chromium.github.io/webgl-examples/hdr-on-canvas.html

For PQ:

For HLG:

AstralStorm commented 11 months ago

General

One possibility is to just do no tone mapping and no gamut mapping, assuming the screen is BT.2100 PQ which is what all the OS support.

When doing any processing, prefer high bit depth framebuffer formats such as 16-bit floating point to limit banding due to processing. Prefer specifying function parameters in absolute nits rather than some obscure unitless value or percentages.

SDR

Map SDR relative to black point, using either midtone level or explicit SDR peak level as guide and BT.1886 for video, sRGB for general web content as the brightness function, converting it to the absolute color space with a simple offset and multiplication. For black point, use the same value as specified for HDR tone mapping (mentioned later). A different brightness mapping or color space can be specified in the video container, in CSS or image ICC profile.

Inverse tone mapping (SDR to HDR upconversion), if implemented, should be optional as it can produce bad results on some content. A decent choice for inverse tone mapping function is provided in BT.2446 annex A.

Interpreting BT.709 video as sRGB is not the approach media players take and should be avoided.

HLG

Use SDR peak level as Lw. All other processing applies as per HDR after converting from HLG to desired absolute space. Alternatively provide a separate tunable with Lw=203 as the default based on BT.2446 recommendation. HLG content is relatively rare in the wild.

Tone mapping

Function specified in BT.2408-6 is a good way to tone map. A function that allows directly specifying midtone levels and black point is recommended for general use. (Older BT.2390, BT.2446 and SMPTE recommendations do not provide for setting black point.)

I'd recommend following HGIG recommendation, adding separate peak, midtone (primary range) and black point controls. Hardcoding anything is a great way to annoy users. See HGIG PDF, but remember that often the displays or OS provide invalid MinTML, MaxTML, MaxFFTML etc. so these values have to be tunable.

The midtone control is particularly important considering the screens have varied amount of adaptive backlight limiting. Certain OLED screens will limit brightness for full screen content even in the 203 nit range. Additionally high levels would result in SDR content being blinding.

The screen values are retrievable in Windows using DXGI IDXGIOutput6::GetDesc1 API and for testing purposes can be changed using an MHC2 color profile; standard ICC lumi tag maps to MaxFFTML and MaxTML is specified in MHC2 tag. Default set of values returned from that API should be detected and potentially ignored. (specifies 1499 nit MaxTML and MaxFFTML, 0.005 MinTML, truncated D65 white and a particular rare set of primaries)

For implementation recommendations, working in any absolute light referred brightness space is ok (this includes PQ and scRGB), but prefer using ITP from BT.2100 as the working color space over the BT.2408 recommendation of R'G'B', converting to RGB or YCC as needed in the final step in order to limit hue shifts in tonemap processing. The small degree of gamut expansion involved is generally not important for typical HDR displays, while the hue shift can be visible in highlights. The expansion is especially irrelevant if gamut mapping is done in the next step.

There should be an option to disable tone mapping in case the screen or OS already does it. In this case, only SDR content is adapted and no clipping or mapping is applied for HDR content.

Gamut mapping

Here, the situation is more complicated as there are few recommendations. A good starting point would be BT.2407-0. An alternative would be to use the same equation in ITP uniform color space, as specified in BT.2100.

Windows scRGB gamut is converted to BT.2020 YCC or RGB or BT.2100 ITP internally by the OS by simple clipping. A direct BT.2020 gamut space is also provided by DXGI, but only widely supported in a 10-bit depth. No other HDR color space is available on that platform.

Even if the screen primaries are known, color profile might be applied by the OS, as is the case in Windows. Firefox is known to apply ICC display profiles again despite OS already doing it.

The image color profiles should be applied as if the screen supports whole BT.2100 gamut, before any tone mapping.

There should be a separate configuration option to disable gamut mapping in case the screen already does it. In this case, only SDR content is adapted using a matrix into BT.2100 color space.