mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
27.78k stars 2.86k forks source link

Colour in mpv is dimmed compared to QuickTime Player #4248

Closed fumoboy007 closed 1 year ago

fumoboy007 commented 7 years ago

mpv version and platform

macOS 10.12.3, mpv 0.24.0, pre-built by stolendata

Reproduction steps

Open the attached sample video in mpv and QuickTime Player.

Expected behavior

The colour in mpv looks the same as in QuickTime Player.

Actual behavior

The colour in mpv is dimmed compared to QuickTime Player.

QuickTime Player (good)

quicktime player

mpv (dimmed)

mpv

Log file

http://sprunge.us/EQOa

Sample files

Config

player-operation-mode=pseudo-gui
icc-profile="/Library/ColorSync/Profiles/Displays/Color LCD-F466F621-B5FA-04A0-0800-CFA6C258DECD.icc"
hwdec=auto
log-file=/Users/fumoboy007/Desktop/output.txt

Sample Video

https://www.dropbox.com/s/khnzs60z1wz2fjt/Rec%20709%20Sample.mp4?dl=1

The video is tagged with the colour profile that Finder describes as HD (1-1-1) (Rec 709).

mohd-akram commented 6 years ago

I added that to ~/.config/mpv/mpv.conf, but it didn't seem to make a difference. I'm using a MBP 13" (2016).

mpv mpv

QuickTime Player QuickTime Player

cheeexq commented 6 years ago

Same problem here. I'm using an MacBook Pro 13" (2016).

Following this thread, after days of tweaking, I'm even more confused than before. I'm currently using mpv 0.27.2 under macOS 10.13.4 Beta (17E160e), and here are some of my findings:

  1. When applying no tweak like --icc or --gamma, the rendering result in mpv (above) appeared darker than in QuickTime Player (below), with obvious loss of detail (i.e. black crush). mpv
qt
  1. Setting --opengl-gamma=1.22386537481 --icc-contrast=100000 in mpv.conf (as suggested above) resulted in a brighter and more grayish image (under-contrasted compared to QuickTime Player).

    mpv gamma icc
  2. Setting --vf=format=gamma=gamma2.2 --opengl-gamma=1.1218765935747068 (also suggested above, by @haasn) resulted in an image darker than 2. / slightly brighter than QuickTime Player (well-balanced, maybe?).

    mpv vf gamma
  3. In any case above, whether --hwdec or --icc-profile-auto was on or off didn't affect the result. All mpv results had dimmer highlight zones and slightly over-saturated color, compared to QuickTime Player. The hue was also slightly different.

So my question is: Why did tweak 2 and 3 produce different results?

haasn commented 6 years ago

So my question is: Hack 2. and 3. should produce the same result but didn't.

My memory is hazy but iirc I suggested --icc-contrast=100000 to work around the use of an ICC profile that already embeds black point compensation. (Which is atypical for ICC profiles in colorimetric mode. It would only make sense in perceptual mode, which mpv doesn't use by default)

kkanungo17 commented 6 years ago

Can confirm, vf and gamma multiplier hack do bring gamma to the same level, but there is still a noticeable hue and saturation difference between mpv and Quicktime. mpv: screen shot 2018-03-07 at 7 13 53 pm Quicktime: screen shot 2018-03-07 at 7 22 28 pm

haasn commented 6 years ago

That might be due to BT.601/BT.709 differences?

kkanungo17 commented 6 years ago

Is there a way to get around this?

haasn commented 6 years ago

Try with both --vf=format:colormatrix=bt.601 and --vf=format:colormatrix=bt.709.

kkanungo17 commented 6 years ago

Yup, setting --vf=format:colormatrix=bt.601 and --saturation = -5 fixed it.

haasn commented 6 years ago

What kind of file is this? JPEG?

kkanungo17 commented 6 years ago

720p 8-bit h.264 yuv420p mp4 video.

EDIT: there may still be a slight difference in the color of the skirt, but my eyes hurt from staring too hard, so I can't tell. Well, it is negligible enough, so I can live with it.

HybridDog commented 6 years ago

I've noticed that gamma correction is ignored when scaling. When viewing the image from http://www.ericbrasseur.org/gamma.html?i=1 with halved size (mpv gamma_dalai_lama_gray.jpg --geometry=129x111), it's a grey rectangle. If I adjust the gamma, e.g. by passing --target-trc=srgb or --opengl-gamma=2, it's a brighter or darker grey rectangle.

Edit: --profile=gpu-hq made it work, thanks.

haasn commented 6 years ago

720p 8-bit h.264 yuv420p mp4 video.

mpv assumes 720p and higher is BT.709 (HD) by default, rather than BT.601 (SD). Probably QuickTime does the opposite, for whatever reason. There's no real correct behavior here without proper tagging on the files.

I've noticed that gamma correction is ignored when scaling.

Try using --profile=gpu-hq.

kkanungo17 commented 6 years ago

What about the higher than normal saturation?

haasn commented 6 years ago

Maybe TV vs PC range confusion? If this was originally a JPEG and transcoded to H.264 without correct tagging, then the H.264 version will be detected by mpv as BT.709 TV range, whereas the equivalent JPEG would have been BT.601 PC range.

As for which one of those two is the one ultimately corresponding to the correct presentation of the source material, I can't tell you. In either case, correct tagging should fix all of your issues.

kkanungo17 commented 6 years ago

It's just a 720p video downloaded from YouTube. I don't know how YouTube tags its videos.

EDIT: QuickTime apparently does this for all my files. (Though all of them are shitty re-encodes of anime/movies)

haasn commented 5 years ago

1.18 is sort of close to 2.2/1.961. (It would actually be gamma 2.3. Not that far off given sRGB's effective gamma 2.2 / technical gamma 2.4)

For sRGB in particular, a single gamma value should be insufficient to capture this phenomenon. (As you can see in the graph)

mdejong commented 5 years ago

Hi, I actually just noticed that I made a mistake in that I was calculating the shift in terms of the Y value after it was passed through the matrix calc, so I deleted my previous comment. Let me try that again, I ran an input image that contained sRGB converted from the linear RGB values [0, 255] through the AVFoundation H264 encoding logic to produce a HD video that uses the BT.709 matrix. The goal here was to determine the "boost" that Apple seems to be applying to sRGB data before it is sent to the H.264 encoder. What I found seems to be the best fit is pow(x,1.0/1.14), the graph here shows the Y values. Note that this is a boost amount that would be applied after the sRGB data has been converted to a linear value using the function defined in the sRGB spec. The blue line is X=Y and the quant yellow values are fit by the red values. applegammaboost

UliZappe commented 5 years ago

The goal here was to determine the "boost" that Apple seems to be applying to sRGB data before it is sent to the H.264 encoder.

How did you get the impression that Apple applies any kind of “boost”? Apple simply performs a sRGB → PCS → BT.709 ICC color conversion, using the BT.709 ICC profile with a simplified gamma of 1.961. Have you tried simply using the ColorSync Utility color calculator with the sRGB and BT.709 ICC profiles?

mdejong commented 5 years ago

Hello Uli, I generated the data above by encoding a linear grayscale ramp defined in the sRGB colorspace to H264 using AVFoundation APIs that accept BGRA pixels and output YCbCr wrapped as H264. I have attached the example input image I am using and the source code (this is a work in progress) is located on github at https://github.com/mdejong/MetalBT709Decoder. Could you describe the formula used for this "simplified gamma of 1.961" or a reference to someplace where it is defined so that I could reproduce the results exactly? Currently, I am converting from sRGB to linear and then applying this pow(x,1.0/1.14) logic, it seems close to what Apple AVFoundation and vImage does, but it is not exactly the same. testhdassrgb

mdejong commented 5 years ago

Here is a quick step by step of the calculations that show the need for a sRGB boost. First conversion is shown, it is incorrect as compared to AVFoundation output. Then, the "boosted" calculation which is very close though not exactly the same (this could be simply due to rounding and range issues).

Gray 50% linear intensity: sRGB (188 188 188) -> Linear RGB (128 128 128) -> REC.709 (179 128 128)

sRGB in : R G B : 188 188 188 linear Rn Gn Bn : 0.5029 0.5029 0.5029 R G B in byte range : 128.2361 128.2361 128.2361 lin -> BT.709 : 0.7076 0.7076 0.7076 Ey Eb Er : 0.7076 0.0000 0.0000 Y Cb Cr : 171 128 128

Note that this differs from AVFoundation output Y = 179 Color Sync results "sRGB -> Rec. ITU-R BT.709-5" 0.7373 0.7373 0.7373 -> 0.7077 0.7077 0.7077 188 188 188 -> 180 180 180

This shows that the sRGB -> linear -> BT.709 conversion I am using in C code is exactly the same as the output of Color Sync and both of these values are too small as compared to AVFoundation since the output Y using this grayscale value is Y = 171.

Same calculation with sRGB boost applied: pow(x, 1.0f / 1.14f)

sRGB in : R G B : 188 188 188 linear Rn Gn Bn : 0.5029 0.5029 0.5029 boosted Rn Gn Bn : 0.5472 0.5472 0.5472 byte range Rn Gn Bn : 139.5313 139.5313 139.5313 lin -> BT.709 : 0.7388 0.7388 0.7388 Ey Eb Er : 0.7388 0.0000 0.0000 Y Cb Cr : 178 128 128

I am still trying to work out the decode logic, but this calculation above shows that AVFoundation seems to apply a boost to input sRGB values which I am attempting to emulate.

UliZappe commented 5 years ago

I agree that sRGB (188, 188, 188) results in BT.709 (180, 180, 180).

However, BT.709 (180, 180, 180) results in Y'CbCr (180, 128, 128) if you use the full 0…255 bit range, as your graphic suggests you do. I fail to see how you arrive at Y'CbCr (171, 128, 128); this would be the correct result for the limited 16…235 bit range.

mdejong commented 5 years ago

Humm, my code is for BT.709 encoding with limited tv range of Y = [16, 235] but the result you point out is interesting. The graph above is displaying a normalized value where the Y = [16, 235] range is treated as normalized to the [0.0, 1.0] range or the purpose of graphing it, but this value has been converted back to a grayscale value in the [0, 255] range. My actual code only deals with values in terms of a normalized floats. The conversion from sRGB -> linear -> BT709 treats these normalized floats as non-linear (sRGB), linear, then non-linear (BT.709).

UliZappe commented 5 years ago

I don’t know which of the AVFoundation APIs you are actually using for your calculation, but IIRC correctly, they don’t project 0…1.0 onto 16…235. But AVFoundation probably takes care of potential overflows, ie. it probably somehow limits Y'CbCR values as soon as they approach 16 oder 235. Other than that, they probably take RGB data at face value. I would assume that if you want sRGB[0…255] processed correctly in AVFoundation, you’d have to project [0…255] onto [16…235] yourself. But it’s quite some time since I worked with these APIs, so I might remember this incorrectly.

mdejong commented 5 years ago

So, that is a confusing detail about AVFoundation, if you make use of the AVAssetWriter API and pass a sRGB (RGB input range 0 to 255) CoreVideo buffer then it actually maps this to Y = [16, 237]. It is kind of insane and I don't know why Apple does this. If you use vImage to generate BT.709 then it properly sets the range to [16, 235]. But, that is not critical since it results in the generated Y values only being off by a little bit, I have not really found a flag that determines which type of range encoding was used, so it is kind of crazy town attempting to write a decoder :)

UliZappe commented 5 years ago

I did not really understand why you are trying to do this. What doesn’t work correctly from your POV? The current (AVFoundation based) QuickTime Player is 100% colorimetrically correct, whatever you throw at it.

mdejong commented 5 years ago

I want to properly decode the YCbCr values generated in the colorimetrically correct way that Apple writes values into a H.264 file. The trouble I am having is that when I attempt to write C code to convert the YCbCr values into sRGB, the results I get does not match the values decoded by CoreVideo or vImage even with the same input values. It is really quite strange because I implemented the gamma function from BT.709 and validated that it is working properly, but Apple actually does something different for gamma encoding and I am attempting to reverse engineer exactly what that difference is. My results so far seem to indicate that sRGB values are initially boosted by pow(x,1.14) but I am not positive if this is correct or if I made a mistake somewhere. I don't know what these numbers mean, I am just attempting to determine what Apple actually did to implement this since it is really weird. Going back to the conversion example I posted, is Y Cb Cr : 178 128 128 in fact the correct output and why does my implementation not generate that output without the boost added?

haasn commented 5 years ago

The BT.709 curve is irrelevant for decoding

mdejong commented 5 years ago

Perhaps I am just completely missing something, but I was under the impression that the R', G', B' values delivered into the BT.709 matrix transform are gamma corrected values that then need to be converted back to linear values with a reverse transformation. I was basing this on how sRGB provides a gamma encode and a gamma decode function. Is this not actually how BT.709 and H.264 are implemented? Are the values written into the conversion matrix simply linear inputs?

static inline
float BT709_nonLinearNormToLinear(float normV) {

  if (normV < 0.081f) {
    normV *= (1.0f / 4.5f);
  } else {
    const float a = 0.099f;
    const float gamma = 1.0f / 0.45f;
    normV = (normV + a) * (1.0f / (1.0f + a));
    normV = pow(normV, gamma);
  }

  return normV;
}

static inline
int BT709_convertYCbCrToLinearRGB(
                             int Y,
                             int Cb,
                             int Cr,
                             float *RPtr,
                             float *GPtr,
                             float *BPtr,
                             int applyGammaMap)
{
  // https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.709_conversion
  // http://www.niwa.nu/2013/05/understanding-yuv-values/

  // Normalize Y to range [0 255]
  //
  // Note that the matrix multiply will adjust
  // this byte normalized range to account for
  // the limited range [16 235]

  float Yn = (Y - 16) * (1.0f / 255.0f);

  // Normalize Cb and CR with zero at 128 and range [0 255]
  // Note that matrix will adjust to limited range [16 240]

  float Cbn = (Cb - 128) * (1.0f / 255.0f);
  float Crn = (Cr - 128) * (1.0f / 255.0f);

  const float YScale = 255.0f / (YMax-YMin);
  const float UVScale = 255.0f / (UVMax-UVMin);

  const
  float BT709Mat[] = {
    YScale,   0.000f,  (UVScale * Er_minus_Ey_Range),
    YScale, (-1.0f * UVScale * Eb_minus_Ey_Range * Kb_over_Kg),  (-1.0f * UVScale * Er_minus_Ey_Range * Kr_over_Kg),
    YScale, (UVScale * Eb_minus_Ey_Range),  0.000f,
  };

  // Matrix multiply operation
  //
  // rgb = BT709Mat * YCbCr

  // Convert input Y, Cb, Cr to normalized float values

  float Rn = (Yn * BT709Mat[0]) + (Cbn * BT709Mat[1]) + (Crn * BT709Mat[2]);
  float Gn = (Yn * BT709Mat[3]) + (Cbn * BT709Mat[4]) + (Crn * BT709Mat[5]);
  float Bn = (Yn * BT709Mat[6]) + (Cbn * BT709Mat[7]) + (Crn * BT709Mat[8]);

  // Saturate normalzied linear (R G B) to range [0.0, 1.0]

  Rn = saturatef(Rn);
  Gn = saturatef(Gn);
  Bn = saturatef(Bn);

  // Gamma adjustment for RGB components after matrix transform

  if (applyGammaMap) {
    Rn = BT709_nonLinearNormToLinear(Rn);
    Gn = BT709_nonLinearNormToLinear(Gn);
    Bn = BT709_nonLinearNormToLinear(Bn);
  }

  *RPtr = Rn;
  *GPtr = Gn;
  *BPtr = Bn;

  return 0;
}
mdejong commented 5 years ago

If I am in fact implementing this decoding of non-linear input values correctly, then an input of X = Y (each linear RGB value) should be gamma adjusted and then when the BT709_nonLinearNormToLinear() function converts the value back to linear RGB then the graph of the value should be adjusted back to meet the X=Y line. What I am seeing when using AVFoundation and vImage Apple APIs is the following. The blue line here is the X = Y line, the green values are how vImage encodes the input values and the yellow line shows the result of BT709_nonLinearNormToLinear(). It seems that Apple has added the yellow line to boost input values up higher, I can only assume this has something to do with the "dark room" issue with gray levels and video?

vimageboosted

UliZappe commented 5 years ago

The BT.709 curve is irrelevant for decoding

Not in an ICC compliant implementation of color management (which is the only reasonable color management approach on computers). In ICC color management, you must always come back to exactly the same PCS values (XYZ or Lab) that you started with. Therefore, if you used BT.709 to encode video, you must also use BT.709 to decode video and forward the correct XYZ values to the display color space.

I have no idea why this is so hard to understand.

Hrxn commented 5 years ago

I feel like I've already watched the same debate unfold on here before. I wonder why..

UliZappe commented 5 years ago

You do? :smiling_imp:

mdejong commented 5 years ago

I am really not trying to start a flame war or anything :) I just literally cannot find any good information online about how this RGB -> BT709 YCbCr process is meant to work. I wrote code to encode and then decode and I can round trip the value without gamma, but Apple has done something completely different and it is really confusing.

UliZappe commented 5 years ago

Perhaps I am just completely missing something, but I was under the impression that the R', G', B' values delivered into the BT.709 matrix transform are gamma corrected values that then need to be converted back to linear values with a reverse transformation. I was basing this on how sRGB provides a gamma encode and a gamma decode function.

Yep, that’s totally correct. Generally speaking, you’d have to transform sRGB to XYZ (the profile connection space) and then XYZ to BT.709; that’s how ICC color management works. But since sRGB and BT.709 share the primaries, that comes down to converting from sRGB tone response curve to linear and then back to the BT.709 tone response curve. The “simplified gamma” I was talking about simply replaces the complex tone response curve of BT.709 with an approximated gamma of 1.961 (just like the complex sRGB tone response curve is sometimes replaced by a “simplified gamma” approximation of 2.2). So there’s nothing special here.

But this obviously isn’t the issue at all, since all approaches seem to agree upon the same BT.709 RGB values (possibly with some rounding errors) derived from given sRGB values.

The issue seems to be the conversion from BT.709 RGB to Y'CbCr. It’s here where the inconsistencies seem to appear.

UliZappe commented 5 years ago

I am really not trying to start a flame war or anything :)

Never mind. We are perfectly capable of starting another one on our own. :sweat_smile:

haasn commented 5 years ago

@mdejong

I just literally cannot find any good information online about how this RGB -> BT709 YCbCr process is meant to work.

These are the relevant standards (they're also also the ones that define H.264, but it's more or less irrelevant since that just references BT.709):

The ITU-R started caring a bit more about this stuff when they were re-exploring gamma curves in the context of HDR/UHDTV, and some of their reports on those contain good summarizations of the gamma issue up to this point, specifically:

I very highly recommend reading those two sections at the very least, if not the standards themselves (which can be a bit dense).

Edit: But to summarize the answer: BT.709 is not meant to "round trip". When we are talking about BT.709 (an ITU-R standard with a clear purpose), we are talking about a set of standards based on legacy analog TV practices, as @UliZappe is usually quick to point out. The concept of "round tripping" doesn't apply here, because there is no round trip. There is only the (thorny) path from scene-referred / camera input to display-referred / end viewer. The ITU-R approach is to view, analyze and optimize this end-to-end transformation as a whole. They never talk about any context in which it makes sense to "undo" a step of processing. (Except perhaps PQ/HLG cross-conversions, which are their own can of worms)

Going backwards in this chain of processing steps by starting with the first step in the chain is nonsensical; which is why mpv only considers the last steps (display output) in all of its processing. If you are trying to design a reversible set of color encodings (which is what ICC is trying to do) based on BT.709, you have no real choice but to base your reverse function on the final output step. So if anything, rather than defining your ICC profile that approximates the BT.709 OETF, you want to be approximating the BT.1886 EOTF for typical displays. (i.e. approximating typical mastering reference displays)

I assume whatever apple engineer came up with their implementation of these ICC profiles just ended up glancing through the standards without reading into them in greater depth, and ended up coming to the same (wrong) conclusion that BT.709 was like sRGB in that the display was supposed to reverse this curve exactly, and sort of went from there. But those two standards are very different and difficult to compare.

This figure from the BT.2246 report I linked above illustrates how the BT.709+BT.1886 combined OOTF ends up preserving the scene near-linearly from camera to display, which is why it's stuck around for so long (despite the ITU-R admittedly not really understanding this interaction previously): Figure 20

If you take only the BT.709 function into consideration, this near-linearity ends up destroyed. And recognizing that these are not scientifically accurate sources (e.g. calibrated cameras or scanners) but rather the output of human artists whose intent depends on what they saw on their screens at the time, mpv believes it is the most important to preserve as closely as possible on our screens what they saw on theirs. (Which is why we do BT.1886-based black point adaptation instead of relying on the CMM to do it for us)

Any of these color management discussions ultimately boil down to this question, and so far nobody really has an answer to them: "Which approach better approximates the reality of video mastering as performed in studios worldwide?", as well as other content producers such as video game developers, digital artists, web designers, etc. (Recognizing that many videos on user-driven content sites are most likely screen captures of these)

mdejong commented 5 years ago

Geesh, you are blowing my mind man. I have spent days and days reading video docs and writing code to attempt to deal with this weirdness and at this point I think it might be better to just give up and do something crazy like just encode with the sRGB encode and decode gamma curves. I would settle for some logic that would just reproduce the same values that Apple splits out when decoded a video that was also encoded with Apple software. What I think is going on with the Apple approach is best shown with this graph, the blue line is X=Y linear, the yellow line is the values encoded as BT.709 2 part gamma, and the green line is the special "boost" that Apple adds to incoming sRGB values. btboost

haasn commented 5 years ago

The “simplified gamma” I was talking about simply replaces the complex tone response curve of BT.709 with an approximated gamma of 1.961 (just like the complex sRGB tone response curve is sometimes replaced by a “simplified gamma” approximation of 2.2). So there’s nothing special here.

I don't think it's okay to simply dismiss this. The "simplification" from complex curve with linear segment to pure power curve actually significantly alters the curve's behaviour near the black point. Just to make it clear, curve-matching by numerical analysis in linear light is the wrong approach. Our perception is quasi-logarithmic, so if anything, you want to minimize the sum of log(input) - log(output) (or phrased another way: "the product of relative errors input / output"). When you do this, you notice the linear segment has a significant effect to the black point in log space, as you can see in that figure 20 I posted above. Which, incidentally, is why Apple is actually right in choosing a pure power curve for their "BT.709 simplification", since by decoding the values (which have been encoded with the linear segment) using a gamma function without a linear segment, they are getting the same black point behavior (in XYZ space) as you would have seen on the screen in an ITU-R mastering booth. (Although I wouldn't call it a "BT.709 simplification" at that point. I'm not sure whether they ended up doing the right thing by accident or not, although the fact that this is the only way it all works out is most likely relevant.)

Simple trial by experimentation is enough to arrive at this conclusion, which is what both the ITU-R and Apple seem to have done (seemingly independently). The only major disagreements at this point are:

  1. How do you handle black point adaptation? The ITU-R defines a "perceptually driven" process that tries preserving scene contrast perceptually, which is why they do black point adaptation in the source space (gamma). The alternative is doing black point adaptation in the output space (linear), which is what many CMMs choose to do since it allows doing BPC in the PCS. mpv does the former.

  2. Do you choose to include any gamma boost to compensate for the effects of the viewing environment on our perception, which are designed into the BT.709/BT.1886 system (with an overall system gamma of about 1.2)? mpv supports this option, using --gamma-factor.

UliZappe commented 5 years ago

The “simplified gamma” I was talking about simply replaces the complex tone response curve of BT.709 with an approximated gamma of 1.961 (just like the complex sRGB tone response curve is sometimes replaced by a “simplified gamma” approximation of 2.2). So there’s nothing special here.

I don't think it's okay to simply dismiss this. The "simplification" from complex curve with linear segment to pure power curve actually significantly alters the curve's behaviour near the black point.

I completely agree in general. I only referred to the question why in the given example the Y' of Y'CbCr is ~171 in one case and ~180 in the other. This is nothing that could result from the difference between the complex and the simplified TRC of BT.709.

mdejong commented 5 years ago

So, I believe that I finally figured out what was going on with my export using AVFoundation, what appears to have happened is that I attempted to export sRGB pixels using AVFoundation/vImage and instead of using the BT.709 gamma curve, the actual data that looked "boosted" was actually just the original sRGB gamma curve encoded pixels being written into the BT.709 matrix transform without first being converted to the BT.709 gamma space. This was causing what looked like a boost but it was actually just producing invalid data since a player would assume that the BT.709 gamma encoding was used as opposed to the sRGB gamma curve. This was caused by setting the colorpace property on a CoreVideo pixel buffer, that seems to be an error. The apple docs are so strange and murky on this point and there are no examples that show this actually working that I have been able to find. Here is visual that helped me understand what was actually going on. The quant values inside the larger purple line show the original sRGB gamma encoded values, the quant values inside the reddish line what identity X=Y linear values should be if encoded properly with the BT.709 colorspace. Finally, the green quant values around the blue X=Y line show the result of reversing the BT.709 gamma encoding on the encoded values which are now correctly emitted with the BT.709 colorspace setting. srgb_vs_bt709

UliZappe commented 5 years ago

Glad you could figure it out. :blush:

mdejong commented 5 years ago

The “simplified gamma” I was talking about simply replaces the complex tone response curve of BT.709 with an approximated gamma of 1.961 (just like the complex sRGB tone response curve is sometimes replaced by a “simplified gamma” approximation of 2.2). So there’s nothing special here.

I don't think it's okay to simply dismiss this. The "simplification" from complex curve with linear segment to pure power curve actually significantly alters the curve's behaviour near the black point.

I completely agree in general. I only referred to the question why in the given example the Y' of Y'CbCr is ~171 in one case and ~180 in the other. This is nothing that could result from the difference between the complex and the simplified TRC of BT.709.

On the subject of the 1.961 gamma applied at decode time, there does seem to be a real problem with this approach in that computer generated colors that have very different R'G'B' primaries will become very different as X approaches 0. For example, the sRGB value (5 5 5) becomes REC.709 (18 128 128), when decoded with the BT.709 2 part gamma curve the result is (6 6 6). But, when a simplified 1.961 gamma is applied the value gets pushed to zero. This is going to lose information in dark regions, the following graph shows the effect, the blue line is X=Y and the yellow line is where BT.709 values would be gamma shifted. Note the purple line which shows the gamma adjusted values being operated on with the simplified 1.961 gamma, a significant number of values go to zero as the graph approaches zero. It seem that the lack of a linear segment here is losing a lot of information in the very dark areas. near1961

haasn commented 5 years ago

But, when a simplified 1.961 gamma is applied the value gets pushed to zero.

No it doesn't. It gets decoded to about 0.0001001633.

UliZappe commented 5 years ago

Note the purple line which shows the gamma adjusted values being operated on with the simplified 1.961 gamma, a significant number of values go to zero as the graph approaches zero. It seem that the lack of a linear segment here is losing a lot of information in the very dark areas.

True, which is why Apple’s CMM uses a technique called Slope Limit (see above in this thread) to kind of force a linear segment during conversion in the CMM itself, independent of the specific TRC that is used. This fixes the loss of detail in dark regions.

mdejong commented 5 years ago

Hello again. I did some more detective work mucking around with Apple's color management APIs and I was able to discover the exact constants that Apple is using for HDTV -> sRGB in Quicktime X/AVFoundation. There is a 1.96 like gamma curve, but the black levels are defined with a linear slope of 16 in the very low range. These two functions capture the logic.

#define APPLE_GAMMA_196 (1.960938f)

static inline
float AppleGamma196_nonLinearNormToLinear(float normV) {
  const float xIntercept = 0.05583828f;

  if (normV < xIntercept) {
    normV *= (1.0f / 16.0f);
  } else {
    const float gamma = APPLE_GAMMA_196;
    normV = pow(normV, gamma);
  }

  return normV;
}

static inline
float AppleGamma196_linearNormToNonLinear(float normV) {
  const float yIntercept = 0.00349f;

  if (normV < yIntercept) {
    normV *= 16.0f;
  } else {
    const float gamma = 1.0f / APPLE_GAMMA_196;
    normV = pow(normV, gamma);
  }

  return normV;
}
mdejong commented 5 years ago

The following attached images show the default output when decoding the Quicktime HD test video on iOS (uses CoreImage and CoreVideo logic and matches output of QTX), then the output of my Metal shader that makes use of non-linear to linear methods mentioned above. For comparison, the final image is what this test video decodes to with the BT.709 defined gamma (this final image is basically what mpv decodes as). Note that difference in the going to black area just above the colors, one can see the white boxes appearing at some point after the 50% of the screen width mark, the best match is with the slope of 16.

ios_decode_corevideo ios_decode_a196_gamma ios_decode_bt709_gamma

UliZappe commented 5 years ago

I did some more detective work mucking around with Apple's color management APIs and I was able to discover the exact constants that Apple is using for HDTV -> sRGB in Quicktime X/AVFoundation. There is a 1.96 like gamma curve, but the black levels are defined with a linear slope of 16 in the very low range.

Yep, both is true (gamma 1.961 in fact), but both was already discussed here and in fact already correctly implemented in mpv and my Slope Limit patch for Little CMS before @haasn decided it would be a good idea to emulate an ancient, incorrect, out-of-date technology instead which is completely incompatible with computer color management as we know it. :smiling_imp:

1.96(1) is simply the closest gamma approximation of the complex Rec. 709 TRC, although Apple went to great lengths to argue for it independently from that mathematical connection in the paper where they introduced it when they introduced video color management on computers (see the post from @fumoboy007 on 20 Mar 2017 above, paragraph “Deriving the Conversion From Source Color to PCS”). They probably did this to soothe the many video traditionalists (the video community seems to be incredibly retro oriented technologically). You will find 1.96 and 1.961 over and over if you search this thread.

With regard to slope limiting, note that this is not an Apple only behavior. Both Apple’s ColorSync CMM and the Kodak CMM do this, while the Adobe CMM also implements it, but uses 32 instead of 16 for the linear segment. Little CMS is probably the only of today’s relevant CMMs that does not implement it. Unfortunately, at least 3 years ago, Marti Maria (the author of Little CMS) did not want incorporate my patch in the main branch of Little CMS for fear of copyright issues (at least this was his explanation; he might have also disliked to “pollute” the “clean” CMM with the very “pragmatic” industry approach of slope limiting).

(this final image is basically what mpv decodes as)

Uhm, no – certainly not with the default parameters. Which parameters do you refer to?

haasn commented 5 years ago

@mdejong Last comment seems incorrect. By default, mpv does not perform color management at all. It simply outputs the encoded colors as-is, and relies on the display implementing the correct transfer function implicitly. That looks like this:

mpv test pattern default

If you explicitly set target gamma 2.2 (which is what mpv falls back to for e.g. HDR content), you get this:

mpv test pattern gamma 2.2

And this is what it looks like when using an sRGB ICC profile as the target (assuming a 1000:1 contrast), which would be more suitable for a PNG:

mpv test pattern sRGB

Observe that opening this png in krita displays identically to what I get in mpv if I let it use my monitor's native display profile (krita on top, mpv on bottom):

krita vs mpv

haasn commented 5 years ago

Interestingly, for a typical 1000:1 contrast, gamma 1.961 is a very good fit for BT.1886 at the low end, whereas gamma 2.2 better approximates it everywhere else:

TRCs at contrast 1000:1

For the TRCs other than BT.1886, black point compensation was done as if the CMM performs linear scaling in the PCS.

haasn commented 5 years ago

And this is what it looks like if we add slope limiting. As I've pointed out before, adding the linear section to the 1.961 curve does indeed make it almost exactly match Bt.1886 at the low end:

with slope limit