lcms2 corruption with opengl floating-point FBO formats

mpv-player / mpv

🎥 Command line video player

https://mpv.io

Other

27.9k stars 2.87k forks source link

lcms2 corruption with opengl floating-point FBO formats #102

Closed Cyberbeing closed 11 years ago

Cyberbeing commented 11 years ago

The following floating-point FBO formats results in corruption when used with lcms2:

rgb16f rgba16f rgb32f rgba32f

It mostly occurs as small rainbow colors blocks in high contrast areas, so it's likely being caused by invalid out-of-range values.

Win7 SP1 x64 NVIDIA GPU mpv git-a9892f9 32-bit

ghost commented 11 years ago

Do any FBO formats actually work correctly for you? The FBO format keeps being a source of trouble, and I've found no format yet which works reliable (and fast) on all platforms/GPUs.

But on Linux with nvidia binary drivers I can observe that these float formats seem to work as expected. It also could be that the ICC profile you're using is too extreme or other bugs.

Cyberbeing commented 11 years ago

This ICCv2 16bit cLUT profile was generated by Argyll CMS, and works fine in MPC-HC EVR-CP which also uses lcms2 for color management, as well as other color managed applications.

On my NVIDIA card, all of the FBO formats work correctly without lcms2 active.

The default of rgb16 (integer) on OpenGL HQ seems fine, but as far as I know, NVIDIA doesn't actually support this format natively and handles it internally as rgba16f, yet no corruption with lcms2 like specifying FBO rgba16f explicitly. The only slow formats are the single-precision rgb32f & rgba32f. Quality degrades a lot with any of the less common formats which have 8-bit fallbacks.

I mentioned this a long time ago on Doom9, but even the rgb32f FBO format will show banding which doesn't exist in the source compared to madVR which also uses rgb32f textures. As far as current versions of madVR are concerned, YCbCr->RGB32f 16-235 including WTW+BTB -> processing -> RGB32f 16-235 including WTW+BTB to 0-255 range -> RGB32f to RGB8 -> Display, with proper dithering at all steps which need it. madVR manually allocates rgb32f textures and uses them as a working space for conversions and processing via extremely high precision shaders, always dithering down to RGB8 before handing it off to D3D for rendering and display output.

ghost commented 11 years ago

but as far as I know, NVIDIA doesn't actually support this format natively and handles it internally as rgba16f

I have different results.

yet no corruption with lcms2 like specifying FBO rgba16f explicitly

Wait what. Does using icc work as expected with any format or not? The opengl-hq default setting is rgb16.

WTW+BTB

What's that?

The opengl VO uses the specified FBO format (actually a texture format) for all immediate results. The shaders use the GLSL type float for intermediate results. The icc file is turned into a 3D texture and always has the format GL_RGB16. Dithering is done as last step before the output as well. Note that the icc LUT has by default the size 128x256x64 (for RxGxB), and always works in RGB.

Cyberbeing commented 11 years ago

I have different results.

I've not actually verified this personally, so it could have continued to improve with newer GPU generations compared to this 2005 document: http://http.download.nvidia.com/developer/OpenGL_Texture_Formats/nv_ogl_texture_formats.pdf I've been unable to find any updated version of this document.

Does using icc work as expected with any format or not?

Yes, as mentioned above, the opengl-hq default of rgb16 FBO works fine with lcms2. Only the f floating point FBO formats do not work without minor corruption with lcms2.

Replies unrelated to this lcms2 issue below

What's that?

WTW = whiter than white BTB = blacker than black

It's information which is outside the valid 16-235 range, on TV range content.

The opengl VO uses...

I don't have any specific suggestions.

I'm only sharing my observations that the mplayer_/mpv opengl shader renderer is not maintaining the same level of precision as madVR from decode -> processing -> display, which occasionally results in visible banding. Last time I tested this I was not using lcms2. Normally I never use lcms2 in mplayer_/mpv or mpc-hc.

ghost commented 11 years ago

I've not actually verified this personally, so it could have continued to improve with newer GPU generations compared to this 2005 document: http://http.download.nvidia.com/developer/OpenGL_Texture_Formats/nv_ogl_texture_formats.pdf

Yeah, that's probably outdated. Unless in my test the driver somehow reverted to software or so, which seems unlikely.

It's information which is outside the valid 16-235 range, on TV range content.

In mpv they are clamped to normal range after conversion.

the mplayer*/mpv opengl shader renderer is not maintaining the same level of precision as madVR from decode -> processing -> display, which occasionally results in visible banding

Can you share a sample, complete with madvr and mpv output?

Cyberbeing commented 11 years ago

Can you share a sample, complete with madvr and mpv output?

The last time I actually took screenshots was last year with gl3 rgb32f in mplayer2: http://img194.imageshack.us/img194/6046/mplayer2gl38bitbanding.png

I'm unable to remember which video the above was from, so I'll see if I can track down another good sample.

In the meantime, from an objective point of view, you can verify this issue by looking at the color histogram of a screenshot. Any jagged edges and/or gaps you see in a histogram are essentially banding. For example, here is a histogram comparison of identical frames I just took in mpv opengl-hq rgb32f & madVR with an 8-bit video: http://imageshack.us/a/img580/2687/madvrmpvhistogram.png

ghost commented 11 years ago

Actually found some dithering bugs and fixed them. I tested only with a simple 8 bit gradient, though.

The last time I actually took screenshots was last year with gl3 rgb32f in mplayer2: http://img194.imageshack.us/img194/6046/mplayer2gl38bitbanding.png

Showing these side-by-side makes it really hard to see anything. Taking the difference between both shows that the maximum difference seems to be 3/255.

from an objective point of view, you can verify this issue by looking at the color histogram of a screenshot. Any jagged edges and/or gaps you see in a histogram are essentially banding.

I'm a bit skeptic against this type of comparison. The best way to win this would be bluring everything. (Which technically does get rid of banding, but does do a whole lot of other things to the image.)

Cyberbeing commented 11 years ago

I'm a bit skeptic against this type of comparison. The best way to win this would be bluring everything.

Just keep in mind that madVR only does the bare minimum dithering needed to prevent loss of precision, and doesn't do any debanding at all to get such a result. If the source has banding, the banding will remain perfectly visible in madVR, if the source is banding free, madVR will not introduce any new banding.

Showing these side-by-side makes it really hard to see anything.

The gl3 side has alternating green -> purple -> green -> purple tinted color bands, while the madVR side was essentially solid gray without a color tint. I did some testing last night, and I was unable to verify if if this particular issue still exists or not in the opengl render, but I did make a few other observations

The difference was still slightly visible at times, but nowhere near as bad as that screenshot I took last year. Notably last year I was using a nvidia 7800GTX 512 on WinXP, and now I'm using an nvidia GT440 DDR5 on Win7, so its possible that significant opengl banding was a quirk of the nvidia 7-series architecture or driver.

mpv's screenshot function seems to be the the cause of the large gaps in the histograms, since the difference to madVR is much smaller when using PrtScn instead. It seems possible that mpv doesn't dither screenshots. With dithering normally performed on the gpu, madVR handles this by doing error-diffusion dithering via the cpu when dumping screenshots. Though even when using PrtScn, madVR's histograms still end up slightly smoother than mpv.

mpv is using incorrect chroma positioning with h264. Judging by the chroma shift and bleeding, it appears mpv is using mpeg1 chroma positioning, instead of mpeg2 chroma positioning which is the default for majority of modern codecs nowadays.

mpv's guassian & mitchell (b>0.0) lscale kernels blur luma when no scaling is performed. Using :scaler-resizes-only is a workaround for this bug.

madVR screenshots saved as PNG end up considerably bigger and less compressible than mpv screenshots (PrtScn). Likely a combination of mpv using a more compressible dithering algorithm and the incorrect chroma positioning which introduces blurring/bleeding. madVR for its part uses random dithering on the gpu, since proper error-diffusion was not easily possible, and it gave superior quality to ordered and pattern type dithering on display.

When you tell Windows to open a video with mpv, it sets syswow64 or system32 as its working directory which is bad, since that's where mpv writes screenshots. Am I missing some switch in mpv to override this directory with a switch?

I created a small 256x200 4:2:0 PC-range BT.601 flagged h264 gradient, but mpv didn't dither it, and it also seemed to have a slight color cast, possibly from using an incorrect matrix.

Actually found some dithering bugs and fixed them. I tested only with a simple 8 bit gradient, though.

I've not created a new build to test this yet.

ghost commented 11 years ago

Just keep in mind that madVR only does the bare minimum dithering needed to prevent loss of precision

What does this even mean? Either you dither, or you don't. Does it disable dithering if the video is RGB 8 bit, the monitor is 8 bit, and no scaling is performed? But in this case, no dithering algorithm should change the image anyway...

I did some testing last night, and I was unable to verify if if this particular issue still exists or not in the opengl render, but I did make a few other observations

Why is it hard to verify this, if it's supposedly so obvious?

mpv is using incorrect chroma positioning with h264. Judging by the chroma shift and bleeding, it appears mpv is using mpeg1 chroma positioning, instead of mpeg2 chroma positioning which is the default for majority of modern codecs nowadays.

Nobody notices this, so we didn't implement it. Actually I proposed to implement it if somebody would test whether it's correct, but nobody volunteered. So, who really cares?

But if you give me a nice test case where a difference can be obviously produced, I'd implement it.

mpv's guassian & mitchell (b>0.0) lscale kernels blur luma when no scaling is performed. Using :scaler-resizes-only is a workaround for this bug.

Gaussian blur blurs, how is that a bug? And yes, by default all filters are applied even if no scaling is performed.

mpv's screenshot function seems to be the the cause of the large gaps in the histograms

mpv has 3 screenshot functions. The first two (mapped to s and S) go out of their way to produce an unscaled screenshot unaffected by random display things, and they use ffmpeg's libswscale internally to do the conversion to YUV. As a nice side-effect, this also reduces complexity in the VO, and is used by other VOs. The last one is not mapped by default and uses glReadPixels() on the window. If it comes to capturing and measuring actual video output, I wouldn't trust any of them.

madVR for its part uses random dithering on the gpu, since proper error-diffusion was not easily possible, and it gave superior quality to ordered and pattern type dithering on display

We considered actual random dithering as too bad to use. We've recently added another dithering method (which is default in opengl-hq) which is in essence a random pattern, but imitates error diffusion. Does madVR's do temporal dithering? We tried that as well, but the first naive attempt gave flickering in dark regions, so we've put that back. (A correct implementation should avoid big changes on the same pixel from frame to frame, and this requires setting up dithering matrices explicitly for this purpose.)

When you tell Windows to open a video with mpv, it sets syswow64 or system32 as its working directory which is bad, since that's where mpv writes screenshots. Am I missing some switch in mpv to override this directory with a switch?

mpv is supposed to be called from command line, and will use the current working dir for screenshots by default. You can set the target directly indirectly as side-effect of --screenshot-template. It's a bit awkward, but on the other hand I don't want to add a new option which would be redundant anyway. Also see discussion in issue 26.

Cyberbeing commented 11 years ago

What does this even mean? Either you dither, or you don't. Does it disable dithering if the video is RGB 8 bit, the monitor is 8 bit, and no scaling is performed? But in this case, no dithering algorithm should change the image anyway...

Dithering is always performed where needed from doing "integer -> floating-point -> integer" conversions. It just wouldn't have much of a visible effect unless TV range -> PC range, YCbCr -> RGB, upscaling|downscaling, gamut correction, or the user started tweaking color controls.

Why is it hard to verify this, if it's supposedly so obvious?

The problem from last year may no longer exist or was hardware specific. I have no way to verify this without finding the same video I took that screenshot from. What I've seen recently including last night has only been minor and likely caused by the incorrect chroma positioning more than anything else.

Gaussian blur blurs, how is that a bug? And yes, by default all filters are applied even if no scaling is performed.

I consider this a bug because nothing else behaves this way when we are talking about a source -> destination resampler. No src->dst change means no interpolation is occurring even if the resampler is run. It's not expected behavior to have Bicubic with default values blur when no resampling has occurred. Maybe you should consider changing the defaults related to this.

Nobody notices this, so we didn't implement it. Actually I proposed to implement it if somebody would test whether it's correct, but nobody volunteered.

The difference is quite easy to noticed if you have hard chroma edges, which is why we changed chroma positioning from mpeg1 to mpeg2 in xy-VSFilter. 3D game and other screen captures are the other big use-cases where using correct chroma positioning YCbCr->RGB is extremely important. If you want a sample, just convert an RGB test pattern of some kind with Avisynth 2.6 | Vaporsynth | VirtualDub to YV12 with mpeg2 chroma positioning and then back to RGB specifying mpeg2 chroma positioning from the YV12. If your implementation matches it's correct, if it doesn't, it's not. If you are having trouble seeing the difference on a hard chroma edge, resample your screenshot to 400% with point sampling.

We have correct pure C and SSE2 implementations which you could use as a reference, just search for "hleft_vmid": https://github.com/Cyberbeing/xy-VSFilter/blob/master/src/subpic/xy_intrinsics.h [Edit: Though this is 4:4:4 -> 4:2:0|4:2:2 while you need 4:2:0|4:2:2 -> 4:4:4]

I'd be happy to visually verify this for you if you implement it, but otherwise just follow the specification defined by MPEG2, H264, H265 and implemented in other open source software.

Does madVR's do temporal dithering?

I do not believe so.

ghost commented 11 years ago

By the way, I think I still don't know why rgb32f + lcms gives artifacts for you. Is it possible for you to test this on Linux?

I consider this a bug because nothing else behaves this way when we are talking about a source -> destination resampler.

I see it like this: these are filters, and some filters are simply more than resamplers. For example sharpen3/5 (rather crappy unsharp mask) do resampling but also filter the image.

Maybe you should consider changing the defaults related to this.

That wouldn't really change anything. With some filters resizing by 1 pixel would make the output look completely different than with no resizing. (Which is why I don't like the option enabling this behavior - it's misleading at best.)

Cyberbeing commented 11 years ago

Is it possible for you to test this on Linux?

No, at least not easily on this computer.

I tried re-compiling with lmcs 2.4 stable instead of latest git, and the same corruption occurred with mpv git-76f6df6 on Win7 SP1. If it helps, here is the ICC profile http://www.mediafire.com/?a8sgc81rqkfoed7

Cyberbeing commented 11 years ago

Here is a trimmed version of a -v log for rgba32f with those FBO tests you added, though I'm unsure how to interpret the output:

[gl] Detected OpenGL 3.0. GL_VENDOR='NVIDIA Corporation' GL_RENDERER='GeForce GT 440/PCIe/SSE2' GL_VERSION='3.0.0' GL_SHADING_LANGUAGE_VERSION='1.30 NVIDIA via Cg compiler' [gl] OpenGL legacy compat. found. [gl] Detected OpenGL features: [Basic OpenGL] [Legacy OpenGL] [OpenGL 2.0] [OpenGL 2.1] [OpenGL 3.0] [Framebuffers] [VAOs] [sRGB textures] [sRGB framebuffers] [Float textures] [RG textures] [NO_SW] Testing user-set FBO format [gl] Create FBO: 16x16 8-bit precision: -0x0p-88 16-bit precision: 0x0p-72 full float: 0x0p-72 out of range value (2): 0x0p-63 [gl] Display depth: R=8, G=8, B=8 Testing user-set FBO format [gl] Create FBO: 16x16 8-bit precision: 0x0p+0 16-bit precision: 0x0p+0 full float: 0x0p+0 out of range value (2): 0x0p+0 Testing GL_R16 FBO (dithering/LUT) [gl] Create FBO: 16x16 8-bit precision: 0x0p+0 16-bit precision: 0x0p+0 full float: 0x0p-81 out of range value (2): 0x0p-63 [gl] Reinit rendering. [gl] Opening ICC profile 'C:\Windows\System32\spool\drivers\color\GDM-F520_5-17-2013_LAB_Ultra_3844.icm' [gl] Opening 3D LUT cache in file 'I:\Temp\icc_cache'. Testing user-set FBO format [gl] Create FBO: 16x16 8-bit precision: 0x0p+0 16-bit precision: 0x0p+0 full float: 0x0p+0 out of range value (2): 0x0p+0 Testing GL_R16 FBO (dithering/LUT) [gl] Create FBO: 16x16 8-bit precision: 0x0p+0 16-bit precision: 0x0p+0 full float: 0x0p-81 out of range value (2): 0x0p-63 VO Config (1920x1080->1920x1080,flags=0,0x409) VO: [opengl-hq] 1920x1080 => 1920x1080 420p10 VO: Description: Extended OpenGL Renderer (high quality rendering preset) VO: Author: Based on vo_gl.c by Reimar Doeffinger [vo] reset window bounds: 160:240:1280:720 [vo] move window: 160:240 [vo] resize window: 1280:720 Testing user-set FBO format [gl] Create FBO: 16x16 8-bit precision: 0x0p+0 16-bit precision: 0x0p+0 full float: 0x0p+0 out of range value (2): 0x0p+0 Testing GL_R16 FBO (dithering/LUT) [gl] Create FBO: 16x16 8-bit precision: 0x0p+0 16-bit precision: 0x0p+0 full float: 0x0p-81 out of range value (2): 0x0p-63 [gl] Texture for plane 0: 1920x1080 [gl] Texture for plane 1: 960x540 [gl] Texture for plane 2: 960x540 [gl] Reinit rendering. [gl] Dither to 8. [gl] Create FBO: 1920x1080 [gl] Resize: 1280x720 [vo] Window size: 1280x720 [vo] Video source: 1920x1080 (1920x1080) [vo] Video display: (0, 0) 1920x1080 -> (0, 0) 1280x720 [vo] Video scale: 0.666667/0.666667 [vo] OSD borders: l=0 t=0 r=0 b=0 [vo] Video borders: l=0 t=0 r=0 b=0 [gl] Reinit rendering. [gl] Dither to 8. [gl] compiling shader program 'frag_osd_libass' [gl] Create FBO: 1920x768 Video filter chain: [vo] 1920x1080 420p10 0

ghost commented 11 years ago

I'm unsure how to interpret the output:

It prints the difference between a representative value and what it got back from the FBO. Ideally, the difference should be 0 for full precision (well, more or less). The values are printed as hex floats, which IMO makes them easier to read for this purpose. The number after the "p" is the exponent. I'm not really sure what's up with these very small values like 0x0p-63.

But your float formats are perfectly fine, and something is wrong with the processing. I attempted a fix in the branch named "gl_fixes", let me know if you think whether it's correct or not.

I also attempted to implement chroma location placement. It's in the "gl_fixes" branch too, but it isn't enabled by bistream flags yet, and the default is still centered. Instead, you have to set an explicit suboption. From the test case I used it looked like I didn't really get it right, but then I tried vdpau and it showed similar output (and it doesn't center chroma, from what it looks like), so maybe it's ok. Whatever.

Cyberbeing commented 11 years ago

But your float formats are perfectly fine, and something is wrong with the processing. I attempted a fix in the branch named "gl_fixes", let me know if you think whether it's correct or not.

I did some testing with both GCC 4.8.0 & 4.8.1 builds, and I'm not seeing any corruption with the floating point FBO formats like before. Your fix likely resolved the issue.

I also attempted to implement chroma location placement. It's in the "gl_fixes" branch too, but it isn't enabled by bistream flags yet, and the default is still centered. Instead, you have to set an explicit suboption. From the test case I used it looked like I didn't really get it right, but then I tried vdpau and it showed similar output (and it doesn't center chroma, from what it looks like), so maybe it's ok. Whatever.

I compared chroma-location=left against madVR and Avisynth 2.6, and it looks correct to me as well. Good job.

Will there be a similar option added for --vo sub [edit: -vf sub] ?

ghost commented 11 years ago

I compared chroma-location=left against madVR and Avisynth 2.6, and it looks correct to me as well. Good job.

OK. Maybe I'll merge that in some days, with honoring bitstream flags by default.

Will there be a similar option added for --vo sub ?

You mean -vf sub? We rely on libswscale to upsample/downsample (a nice trick to avoid having to write optimized routines for every format on our own), so unless libswscale supports chroma position adjustment, no. We currently render subs either in 4:4:4 YUV or RGB, so in theory out code is correct, just libswscale isn't. I'm not even sure if it can work with our approach, haven't tried too hard to think about it.

On vo_opengl and vo_vdpau (the two recommended VOs), this stuff doesn't matter of course, because subs are rendered in RGB.

Anyway, thanks for going through all the testing!