mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
28.73k stars 2.93k forks source link

gpu-next: flashing artifacts on some videos with interpolation and certain tscale algorithms #14474

Open mia-0 opened 4 months ago

mia-0 commented 4 months ago

mpv Information

mpv 0.38.0+git20240701.7c70df09 Copyright © 2000-2024 mpv/MPlayer/mplayer2 projects
libplacebo version: v7.349.0
FFmpeg version: 6.1.1
FFmpeg library versions:
   libavcodec      60.31.102
   libavdevice     60.3.100
   libavfilter     9.12.100
   libavformat     60.16.100
   libavutil       58.29.100
   libswresample   4.12.100
   libswscale      7.5.100

Other Information

Reproduction Steps

mpv --no-config --vo=gpu-next --interpolation --tscale=catmull_rom --video-sync=display-resample https://0x0.st/XA8c.mp4

Expected Behavior

Video plays normally without graphical glitches.

Actual Behavior

Bright single-frame flashes during motion.

Log File

output.txt

Sample Files

https://github.com/mpv-player/mpv/assets/652892/209c4566-bfae-4095-b82c-950dc5f8be38

I carefully read all instruction and confirm that I did the following:

mia-0 commented 4 months ago

Looks like only spline16, spline36, spline64, sinc, lanczos, ginseng and catmull_rom are affected.

haasn commented 4 months ago

It seems like it happens whenever the sum of frame weights gets very close to zero. In this case, we have something like:

  -> Filter offset -2,951833 = weight 0,002780
  -> Filter offset -1,952048 = weight -0,011067
  -> Filter offset -0,952263 = weight 0,041413
  -> Filter offset 1,047880 = weight -0,035180
  -> Filter offset 2,047665 = weight 0,008759
  -> Filter offset 3,047450 = weight -0,001454
wsum: 0,005251

(using 4-tap spline as an example, since that seemed to produce the worst artifacts)

It seems like we end up creating a sort of strange exaggerated sharpening filter, resulting in a lot of ringing whenever the filter kernel exactly aligns with the frame offsets like this. I'm not sure why exactly these filters suffer from it.

haasn commented 4 months ago

So, looking more closely at it, it seems that this is some sort of cursed VFR source where individual frames are randomly missing. Whenever this happens, and the 'missing' frame happens to exactly align with the central lobe of the tscale filter kernel, the result explodes, because of the normalization (division) by wsum. This is a form of temporal aliasing. (Even without the normalization, from a signal theory PoV, the result would tend towards 0 here - leading to flashing black frames instead of oversharpened glitched frames)

I think the correct solution here would be to always sufficiently blur / widen the kernel to prevent any such 'holes' in the data from aliasing the kernel. But I'm not sure how this would look in practice. I can try it out.

haasn commented 4 months ago

I think the correct solution here would be to always sufficiently blur / widen the kernel to prevent any such 'holes' in the data from aliasing the kernel. But I'm not sure how this would look in practice. I can try it out.

Looking at the code, we already do this ,so something about the explanation doesn't quite add up yet..

haasn commented 4 months ago

Okay, figured out how to improve things here. The math as written was only compensating for the case when the vsync duration exceeded the frame duration, however it never accounted for the case of the frame duration itself exceeding 1.0. We can fix it like so:

diff --git a/src/renderer.c b/src/renderer.c
index 802480d7..e80c072c 100644
--- a/src/renderer.c
+++ b/src/renderer.c
@@ -3353,8 +3353,9 @@ bool pl_render_image_mix(pl_renderer rr, const struct pl_frame_mix *images,
         for (int i = 1; i < images->num_frames; i++) {
             if (images->timestamps[i] >= 0.0 && images->timestamps[i - 1] < 0) {
                 float frame_dur = images->timestamps[i] - images->timestamps[i - 1];
-                if (images->vsync_duration > frame_dur && !params->skip_anti_aliasing)
-                    mixer.blur *= images->vsync_duration / frame_dur;
+                float sample_dur = PL_MAX(frame_dur, images->vsync_duration);
+                if (sample_dur > 1.0f && !params->skip_anti_aliasing)
+                    mixer.blur *= sample_dur;
                 break;
             }
         }

(And actually, maybe we should even lower the blur (i.e. sharpen the kernel) in the event that the frame duration is significantly below 1?)

One thing I don't like about this specific approach is that it ends up switching the filter kernel size instantly from e.g. 1 to 2 whenever the 'missing' frame hits the center. It would be nicer if we could, like, somehow interpolate the filter size itself so that the filter grows and shrinks dynamically to adapt to the interval? Maybe something simple like a center-weighted average. 🤷🏻