[vo=gpu-next] Anime4k clamp highlights shader regression

NSQY commented 2 years ago

mpv version

mpv 0.34.0-80-g55b085707a Copyright © 2000-2021 mpv/MPlayer/mplayer2 projects
built on Fri Nov 26 02:27:53 ACDT 2021
FFmpeg library versions:
libavutil       57.9.101
libavcodec      59.13.101
libavformat     59.9.102
libswscale      6.1.100
libavfilter     8.17.100
libswresample   4.0.100
FFmpeg version: N-104652-gb46817d9c3

❯ pacman -Q | rg libplacebo
libplacebo-git 4.157.0.167.g5717088-1

Linux Distribution and Version Arch
Source of the mpv binary mpv-build
If known which version of mpv introduced the problem Unsure
Window Manager and version Gnome 41
GPU driver and version Nvidia proprietary 495.44
Possible screenshot or video of visual glitches --vo=gpu and --vo=gpu-next, respectively:

Reproduction steps

Shader file: https://github.com/bloc97/Anime4K/blob/master/glsl/Restore/Anime4K_Clamp_Highlights.glsl mpv --no-config --glsl-shader="Anime4K_Clamp_Highlights.glsl" --vo=gpu-next foo.mkv

Expected behavior

Subtitles to be displayed as --vo-gpu

Actual behavior

Subtitles are distorted (clamped?). I did not expect the shader to impact subtitles in this case.

Log file

mpv-log.txt

haasn commented 2 years ago

That actually looks like the subtitles are being blended onto the image with the wrong alpha value. If you look closely, you can see the lines from behind the text showing through.

haasn commented 2 years ago

Can reproduce/confirm.

haasn commented 2 years ago

Note @bloc97 that this Anime4K shader has a bug, it handles alpha incorrectly:

    return HOOKED_tex(HOOKED_pos) - (current_luma - new_luma);

should be something like:

    vec4 color = HOOKED_tex(HOOKED_pos);
    color.rgb -= (current_luma - new_luma) * color.a;
    return color;

This is not relevant here (the file has no alpha channel anyway), but it's still something that would blow up if you try loading Anime4K on any file with an alpha channel.

bloc97 commented 2 years ago

Yeah, I pretty much assumed that videos would have no alpha values, ever. I will see if there is a better way to support alpha without slowing down the clamping "hack" that I implemented.

haasn commented 2 years ago

Okay, so, this one is caused by an unfortunate difference in the way libplacebo and mpv handle subtitle blending onto video frames. Basically, in libplacebo, MAIN was changed to be before blending subtitles, whereas in mpv, MAIN is done after blending subtitles. So this works with --vo=gpu because it sees the pre-blended subtitles as part of the video frame, but in libplacebo it does not. This is an unfortunate change that was mandated by a different in the internal architecture.

Possible work-arounds:

Anime4K should be updated to use PREKERNEL for the stats pass as well.
I could add a MAINPOSTSUB to both mpv and libplacebo as a work-around, and Anime4K could use it instead of MAIN. Note, however, that this shader will still badly break if you combine it with e.g. --sigmoid-upscaling or --linear-scaling because by design it's trying to do operations on entirely different colorspaces! This is a clear no-go, and this hack doesn't really fix that much.
You wait until #9399 is fixed. I plan on changing subtitles. I think I'm ultimately leaning towards never blending subtitles onto video frames, period, and I will rewrite the overlay mechanism with that in mind.

bloc97 commented 2 years ago

I think the best course of action would be 3, as this shader only corrects for small highlights that gets over-brightened when upscaling (highlights have meaning in art, and should not be modified (un)willingly). Removing the shader has no other adverse consequence if you don't mind the unusual bright spots that appear rarely here and there...

1 would slow down the shader considerably when upscaling to 4K, this would be unacceptable for what basically should be a "no performance impact shader". 2 would work but it introduces so much work for little benefit. I am also thinking of deprecating this shader in the future in favour of a better upscaler architecture that can handle global style in upcoming versions that would not have this problem.

haasn commented 2 years ago

would slow down the shader considerably when upscaling to 4K, this would be unacceptable for what basically should be a "no performance impact shader".

Sounds like you need the ability to have hook priorities? So e.g. you can hook MAIN 0 for your stats pass, then hook MAIN 10 for your upscaling etc., then finally hook MAIN 99 for the clamp pass. (Or whatever numbers you make up)

haasn commented 2 years ago

I am also thinking of deprecating this shader in the future in favour of a better upscaler architecture that can handle global style in upcoming versions that would not have this problem.

I'm actually thinking of scrapping the entire glsl shader system in favor of something entirely new for --vo=gpu-next, which would be built on top of libplacebo native APIs (pl_hook, pl_shader_custom etc.), and allow users to write shader modules as mpv plugins (C or Lua).

Not sure how performant it would be to have a lua callback run on every frame, but if we can pull it off that would be an interesting way forwards, I think.

bloc97 commented 2 years ago

Sounds like you need the ability to have hook priorities? So e.g. you can hook MAIN 0 for your stats pass, then hook MAIN 10 for your upscaling etc., then finally hook MAIN 99 for the clamp pass. (Or whatever numbers you make up)

Yeah, it's basically the lack of granular control on which section of which shader should be run first, plus the fact that it's not feasible (to my knowledge) to pass uniforms across shaders. (Otherwise I would just pass the native resolution as an uniform and compute the stats on it.) Really what's causing issue here is the fact that Anime4K shaders are modular and there's no guarantee users won't use them in weird ways, I have to make sure the shader works as intended for anything else that might be put in the pipeline.

I'm actually thinking of scrapping the entire glsl shader system in favor of something entirely new for --vo=gpu-next, which would be built on top of libplacebo native APIs (pl_hook, pl_shader_custom etc.), and allow users to write shader modules as mpv plugins (C or Lua).

Not sure how performant it would be to have a lua callback run on every frame, but if we can pull it off that would be an interesting way forwards, I think.

I would be a huge supporter of this, but considering it would take a significant amount of work, it should be planned ahead carefully. Am I correct in the assumption that this would allow people to write a plugin integrating mpv with ML inference libraries like ncnn? That would make mpv support all ML libraries (eg. TensorFlow, PyTorch) out of the box on most desktop/mobile GPUs/CPUs. Also the performance benefits that come with it...

haasn commented 2 years ago

Am I correct in the assumption that this would allow people to write a plugin integrating mpv with ML inference libraries like ncnn?

Maybe, but then only in C. Does ncnn support Vulkan?

bloc97 commented 2 years ago

ncnn only support vulkan as far as I know. My guess would be that it's much easier to support a wide range of GPUs for compute tasks on vulkan. There's also many other libraries for ML but I don't know if they are in C or C++, or sometimes how do they use both and whether can we only use the C part of the code...

haasn commented 2 years ago

The rough first plan would be to expose the ability to load any pl_hook, and attach it like an ordinary user shader, to the mpv plugin API.

That way you could already, with current APIs, write your own C plugin that defines a pl_hook which uses pl_vulkan_get/pl_vulkan_unwrap to unwrap the pl_tex into their corresponding raw vulkan API objects. Then you could do do whatever ncnn requires to process these textures in whatever way you want, and give it back to the vo as a new texture object.

haasn commented 2 years ago

Actually if you want to experiment with this currently, you can, the hard part will be wrapping the ncnn API into pl_hook and you can start working on that inside the mpv code base already. (Just add the resulting hook to p->params.hooks the same way any parsed user shader would)

bloc97 commented 2 years ago

Forgive me for asking, the ncnn library is written in C++ and the API is C++, would it be even possible to integrate it with mpv (which is written in C)? Or do I absolutely need to find a pure C library?

haasn commented 2 years ago

You can trivially write C plugins that use C++ APIs internally. libplacebo even does the same (for glslang).

I've thought more about the scripting project and I think I've come to the conclusion that I don't want to introduce a new scripting platform that depends somehow on libmpv. So the new thing, whatever it is, will be entirely done inside libplacebo. Otherwise it would fork the ecosystem into something where v2 shaders can only be used with mpv and vlc/ffmpeg get left out.

So my current plan is, actually, to integrate some sort of scripting language into libplacebo itself (e.g. lua or js). Then we can write our v2 shaders in those scripting languages directly.

The open question there would be, however, how to do that in a way that lets us use C plugins as well? (for ncnn) Should I add the ability to dlopen files into libplacebo??

haasn commented 2 years ago

Moved to https://code.videolan.org/videolan/libplacebo/-/issues/174

NSQY commented 2 years ago

You wait until vo_gpu-next: subtitles seem to render at low res #9399 is fixed. I plan on changing subtitles. I think I'm ultimately leaning towards never blending subtitles onto video frames, period, and I will rewrite the overlay mechanism with that in mind.

Would this cause issues with fansub projects? .ass overlays for signs and the like.

haasn commented 2 years ago

Would this cause issues with fansub projects? .ass overlays for signs and the like.

Yes and no. Basically, here are the design constraints we're trying to satisfy simultaneously:

Softsubbed signs should be treated as though they were part of the image (and therefore affect by things like shaders that change colors). They should also have the same effective resolution as the underlying frame, so they appear as though they were part of the scenery.
Subtitled text should be sharp and crisp, i.e. rendered at screen resolution. They should also be unaffected by video color management or things like tone mapping.
Softsubbed signs, as well as text that moves, should be affected by interpolation (otherwise you get really nasty judder artifacts on e.g. text that tracks a pan).
User shaders (upscalers etc.) shouldn't see softsubbed text.
Softsubbed text (not but not signs) should have the ability to extend outside the frame boundaries (and fill e.g. the black margins).

The issue with resolving all of these constraints simultaneously is that ASS gives us no way of distinguishing between signs and text (or even between dialogue text and scenery text). So we're forced to lump all subtitles together (dialogue, signs, etc.) and decide on a single place to apply them:

Early in video rendering, before application of shaders: satisfies 1 and 3; violates 2, 4 and 5. (--blend-subtitles=video)
After video rendering, onto the finished frame, but before interpolation: satisfies 2, 3, and 4; violates 1 and 5. (--blend-subtitles=yes)
After interpolation, during screen output: satisfies 2, 4 and 5; violates 1 and 3. (--blend-subtitles=no)

Because these have different tradeoffs, this is exactly why mpv invented the --blend-subtitles option. But in libplacebo I want to do better. The current compromise in libplacebo is actually not even on the above list: it satisfies 3 and 4 (partially) but violates 1, 2, 4 (sometimes) and 5.

After thinking about it more, I decided that constraint 1 is the least important tradeoff here - the downside is that users with very fancy shaders will not see softsubbed signs affected by those shaders, but since most shaders try and preserve colors anyway, it shouldn't be a huge deal. At best you might get a seam or two. I also thought about a way to implement all of these constraints (sans 1) simultaneously: render the subtitles at the end of the video, but interpolate the subtitles "manually". This is basically like --blend-subtitles=no but without violating constraint 3.

In the process, it would also allow me to simplify and unify the subtitle handling code, and also fix issues with e.g. rotated videos. So I'm hugely in favor of doing it the way I have in mind now.

haasn commented 2 years ago

I also thought about a way to implement all of these constraints (sans 1) simultaneously: render the subtitles at the end of the video, but interpolate the subtitles "manually". This is basically like --blend-subtitles=no but without violating constraint 3.

This doesn't work :( No way to have interpolation and alpha-blending interact cleanly. At least not without an extra FBO indirection. Still debating what to do here.

Edit: I think I came up with a way around this problem - draw the overlays first and then alpha-blend the image onto it. That way I can use additive blending for the overlays and alpha blending for the image.

Edit 2: That still breaks if you have multiple regions from the same overlay intersecting... grr

bloc97 commented 2 years ago

What would be the implications of rendering the subtitles in a framebuffer, treating it independently from the video with interpolation (and even shaders) and the compositing it at the end? Is it bad for performance or some other limitation that I'm unaware of?

Edit: The only disadvantage of this method should be partially violating 1, as the softsubs will be rendered at screen resolution, but shaders that change style (i.e. color) could specify whether it should be run twice, once for the video and a second time for the subtitle. This is bad for performance but satisfies 2 3 4 5.

haasn commented 2 years ago

That would be the "extra FBO indirection" I mentioned in my previous post. It's not optimal for performance/VRAM. I'm not happy about it. :(

bloc97 commented 2 years ago

I see... Isn't there already a buffer for mpv's UI? Would it be possible to render subs on that? Edit: Nevermind, we wouldn't want to temporally interpolate the UI, or do we? (Back to square one)

haasn commented 2 years ago

No. There might be a buffer in RAM, but there might also be not, and anyway libplacebo has no way of knowing that.

One plan would be to use an indirection only to handle the case where two parts intersect. That way we can get away without the indirection for the hot path. (Another extremely exotic idea would be to use a compute shader to loop through the parts, perform the necessary intersection tests, and render the correctly blended result. But I'm not a huge fan of that because it's not very backwards-compatible, and also probably overkill given the relatively low cost of an FBO indirection anyway)

yeezylife commented 2 years ago

ncnn only support vulkan

Does this mean if one wants to use the new plugin integrating mpv with ncnn,can only chose gpu-api=vulkan ?

haasn commented 2 years ago

Does this mean if one wants to use the new plugin integrating mpv with ncnn,can only chose gpu-api=vulkan ?

Very probably, yes. There is some theoretical interop but if you have interop support that means you have vulkan support so you might as well use it.

mpv-player / mpv