mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
28.12k stars 2.88k forks source link

AMD vdpau hwdec + opengl-hq corrupted recent regression #2531

Closed AndyFurniss closed 7 years ago

AndyFurniss commented 8 years ago

This is going to be a bit of a crap report as I can't (easily) bisect.

First thanks for fixing vdpau gl interop for amd I see that was fun for you :-)

Somewhere between that breaking = 375886c7779c805e46f79ff3c648f9005bb47830 vo_opengl: probe for EGL by default

and that being fixed = 6b22b216514ee2eb784711f4539410d3b312a4fd vo_opengl: attempt to improve GLX vs. EGL backend detection

There is regression using -

mpv --hwdec=vdpau --vo=opengl-hq 2160p60-h264.mkv

I tried bisecting by making patches from both above but they don't apply far enough into the bisect.

The corruption looks like a decoding issue = (guess) key frame OK -> gets more trashed then OK again -> repeat. Though there are factors that go against this = there are less "back to good" events than I think there are keyframes - maybe it could be that some key frames are missed - this is just guesswork.

doesn't happen with s/w dec doesn't happen with -hwdec=vdpau --vo=opengl doesn't happen with 1080p60

Happens whether I use -fs and get downscaled (1080p monitor) ot without and see unscaled portion in window.

I am (sort of) too slow to play this content --framedrop=no makes no difference to issue and is what I normally use (I am testing my h/w).

AndyFurniss commented 8 years ago

Of course it doesn't make sense being a decoding issue I will get and pastebin -v output on good and bad soon(ish).

kevinlekiller commented 8 years ago

If you can, try --vo=opengl:scale=spline36:cscale=spline36:dscale=mitchell:dither-depth=auto:correct-downscaling:sigmoid-upscaling:pbo:deband:es=no (which is the same as --vo=opengl-hq), and remove some of the parameters, see if you can find if something in there is causing your issue.

ghost commented 8 years ago

I went over the list of commits between the two you pointed out. There were no changes to vdpau. There were some heavy changes to the opengl renderer with the addition of prescalers (7438f208c37deb1a30df54278a6d81227038f33e 4c43c30421b1d713b7a17b437e381fe1efd01902 27dc834f37cd2427798c8cb582a574409865d1e7) - they are not enabled by default, but the resulting shader chain could have changed. There also were heavy changes to display-sync mode, if you use that.

ghost commented 8 years ago

Oh, and I take it you ran the two versions on the same hardware/driver combination. (I've got some reports of hwdec corruption on newer nvidia drivers, but they also happened with vlc for the user.)

AndyFurniss commented 8 years ago

I am testing with the same s/w, which is I admit cutting edge, but then I can get a clear good and bad resetting mpv tree within mpv-build, so using same ffmpeg also.

It seems that many combinations of the full opengl test above fail, so maybe there is something a bit deeper than just one feature. Been AFK so haven't had much time on that one yet.

AndyFurniss commented 8 years ago

Had another go at bisecting getting h/w accel as I went along by applying/reversing - [code] diff --git a/video/out/opengl/common.c b/video/out/opengl/common.c index f045184..4748b86 100644 --- a/video/out/opengl/common.c +++ b/video/out/opengl/common.c @@ -510,7 +510,7 @@ static const struct mpgl_driver *const backends[] = {

if HAVE_GL_WAYLAND

 &mpgl_driver_wayland,

endif

-#if HAVE_EGL_X11 +#if 0 //HAVE_EGL_X11 &mpgl_driver_x11egl,

endif

if HAVE_GL_X11

[/code]

eb66038d4ff9dd2faadd317d8eb777ebbaa9687f is the first bad commit commit eb66038d4ff9dd2faadd317d8eb777ebbaa9687f Author: Niklas Haas git@nand.wakku.to Date: Wed Oct 21 11:09:01 2015 +0200

vo_opengl: make the default debanding settings less excessive

It's great that the new algorithm supports multiple placebo iterations
and all, but it's really not necessary and hurts performance in the
general case for the sake of the 0.1% that actually pause the screen
and look for minute differences. 
AndyFurniss commented 8 years ago

When testing this I noticed that even on "goods" which show no corruption whether -fs or not, that if not fs and I grabbed the image with mouse so I could scroll around it there was a chance of some similar but less severe corruption which continued for a while then cleared.

I couldn't reproduce with s/w dec and this is reproducable with the ./use-*-release build of mpv that I have installed. This one I can produce with release --vo=opengl (in fact it has a chance of being worse with opengl than it is opengl-hq).

ghost commented 8 years ago

Had another go at bisecting getting h/w accel as I went along by applying/reversing -

You can avoid this by forcing the GLX backend with --vo=opengl:backend=x11.

eb66038 is the first bad commit

This seems impossible or a coincidence. It just adjusts the opengl-hq presets slightly.

I couldn't reproduce any of this. I tried a 3840x2160 h264 clip on 340.93 on a GTX 750.

ghost commented 8 years ago

I tried a 3840x2160 h264 clip on 340.93 on a GTX 750.

Which btw. suggests that it's a problem with the AMD driver, rather than any change in mpv.

AndyFurniss commented 8 years ago

I agree it's probably some strange driver/timing/luck issue, which sometime I may hunt for - it's just not something I've tested/noticed before so will be bugger to find - that's if it ever worked.

FWIW I am back on head now with eb66038 reverted and it's good - still not saying this is an mpv issue.

ghost commented 8 years ago

Yes, it sounds like something of a timing/synchronization issue.

As far as the technical details go, we simply maintain a single VdpOutputSurface, which is associated with an OpenGL texture via glVDPAURegisterOutputSurfaceNV. Then, each frame we call glVDPAUUnmapSurfacesNV, render the current vdpau video surface on the output surface, and map it as OpenGL texture glVDPAUMapSurfacesNV. Then it's used for rendering.

AndyFurniss commented 8 years ago

Ok, thanks for the info.

ghost commented 8 years ago

Also, I assume you get none such problems with --vo=vdpau?

AndyFurniss commented 8 years ago

Ahh --vo=vdpau that is a different story which I didn't want to pollute this bug with.

In summary I don't know because the main sample I've been testing with is nowhere near playable at full speed as my GPU doesn't have proper/working power management yet. It is also not playable with CPU.

It seems that --vo-vdpau doesn't honor --framedrop=no and there are so many drops it's just stuck at the start with hwdec. With s/w dec it does get further but still gets stuck after a while.

I recently worked out how to get vaapi to work on my h/w and

--hwdec=vaapi --vo=vaapi does honor framedrop=no and does not have the issue.

I did test the original issue with a lower bit/frame rate sample not made by me - and it existed, so I don't think it's my file, but I didn't test that one with --vo=vdpau.

I can try later - in the middle of compiling a new llvm currently which would not be an ideal time to test something I can hopefully play at full speed.

ghost commented 8 years ago

Yes, vo_vdpau implements its own framedrop for whatever reasons. It can be disabled with --vo=vdpau:fps=-1.

ghost commented 8 years ago

Also, does that mean Mesa/Radeon have vaapi support too now?

AndyFurniss commented 8 years ago

-vo=vdpau:fps=-1 renders OK.

Before I saw that post I was messing with --speed=0.2 and I notice that with --vo=opengl-hq replacing --framedrop=no with speed=0.2 also renders OK.

vaapi does work for me on radeonsi, though I don't really know that OSS users are meant to use it. AMD devs did it - possibly as they are moving towards their closed driver working with open kernel/wharever bits. That may be totally wrong, but it doesn't just work for me. It could be because I build my own stuff/don't know how I suppose, but when I installed libvaapi it would look for

/usr/lib/dri/radeonsi_drv_video.so

which doesn't exist. After installing libvaapi mesa does notice and starts building gallium_drv_video.so. If I make a symlink to that called radeonsi_drv_video.so I have working vaapi.

It doesn't work with mpv --vo=opengl before or after your egl changes. mpv plays as if OK but I just see a still frame of black or junk/old buffer contents.

LateAdopter commented 8 years ago

LIBVA_DRIVER_NAME=gallium works for r600g and maybe nouveau as well.

maxammann commented 8 years ago

Since updating from 0.10 to 0.13 I noticed this bug when running libmpv under linux when having -fsanitize=address enabled:

Is this related to this bug? If not I'll create a new issue.

libEGL warning: DRI2: failed to authenticate
=================================================================
==11960==ERROR: AddressSanitizer: strcpy-param-overlap: memory ranges [0x7f89be5453ec,0x7f89be5453f0) and [0x7f89be5453ec, 0x7f89be5453f0) overlap
    #0 0x7f89f2d9fb7e in __interceptor_strcpy /build/gcc/src/gcc-5.2.0/libsanitizer/asan/asan_interceptors.cc:499
    #1 0x7f89ca42822e  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x1ad722e)
    #2 0x7f89ca429243  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x1ad8243)
    #3 0x7f89ca4286f3  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x1ad76f3)
    #4 0x7f89ca373080  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x1a22080)
    #5 0x7f89ca37324b  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x1a2224b)
    #6 0x7f89ca377433  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x1a26433)
    #7 0x7f89ca379999  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x1a28999)
    #8 0x7f89ca37368c  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x1a2268c)
    #9 0x7f89ca40c0bf  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x1abb0bf)
    #10 0x7f89ca20763d  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x18b663d)
    #11 0x7f89ca20c993  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x18bb993)
    #12 0x7f89ca20f08b  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x18be08b)
    #13 0x7f89ca203783  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x18b2783)
    #14 0x7f89c999520e  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x104420e)
    #15 0x7f89c99d83ab  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x10873ab)
    #16 0x7f89c99d8667  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x1087667)
    #17 0x7f89c8c6ea3e  (/usr/lib/xorg/modules/dri//fglrx_dri.so+0x31da3e)
    #18 0x7f89f11378fc in gl_sc_gen_shader_and_reset (/home/max/projects/rar-streamer/mpv/build/libmpv.so.1+0xcf8fc)
    #19 0x7f89f1138cf3 in pass_draw_osd (/home/max/projects/rar-streamer/mpv/build/libmpv.so.1+0xd0cf3)
    #20 0x7f89f113d820 in gl_video_render_frame (/home/max/projects/rar-streamer/mpv/build/libmpv.so.1+0xd5820)
    #21 0x7f89f11463f3 in draw_frame (/home/max/projects/rar-streamer/mpv/build/libmpv.so.1+0xde3f3)
    #22 0x7f89f11438ed in vo_thread (/home/max/projects/rar-streamer/mpv/build/libmpv.so.1+0xdb8ed)
    #23 0x7f89f141b4a3 in start_thread (/usr/lib/libpthread.so.0+0x74a3)
    #24 0x7f89ef63513c in clone (/usr/lib/libc.so.6+0xe913c)

Address 0x7f89be5453ec is located in stack of thread T14 (mpv/vo)
Address 0x7f89be5453ec is located in stack of thread T14 (mpv/vo)
SUMMARY: AddressSanitizer: strcpy-param-overlap /build/gcc/src/gcc-5.2.0/libsanitizer/asan/asan_interceptors.cc:499 __interceptor_strcpy
Thread T14 (mpv/vo) created by T9 (mpv/playback co) here:
    #0 0x7f89f2d73633 in __interceptor_pthread_create /build/gcc/src/gcc-5.2.0/libsanitizer/asan/asan_interceptors.cc:179
    #1 0x7f89f11424c4 in vo_create (/home/max/projects/rar-streamer/mpv/build/libmpv.so.1+0xda4c4)

Thread T9 (mpv/playback co) created by T4 here:
    #0 0x7f89f2d73633 in __interceptor_pthread_create /build/gcc/src/gcc-5.2.0/libsanitizer/asan/asan_interceptors.cc:179
    #1 0x7f89f10ea86d in mpv_initialize (/home/max/projects/rar-streamer/mpv/build/libmpv.so.1+0x8286d)
    #2 0x60d00005af6f  (<unknown module>)
    #3 0x7f89f2dea39a in __sanitizer::MmapFixedNoReserve(unsigned long, unsigned long) /build/gcc/src/gcc-5.2.0/libsanitizer/sanitizer_common/sanitizer_posix.cc:168

Thread T4 created by T0 here:
    #0 0x7f89f2d73633 in __interceptor_pthread_create /build/gcc/src/gcc-5.2.0/libsanitizer/asan/asan_interceptors.cc:179
    #1 0x7f89efebd492 in __gthread_create /build/gcc/src/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:662
    #2 0x7f89efebd492 in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) /build/gcc/src/gcc-5.2.0/libstdc++-v3/src/c++11/thread.cc:149

==11960==ABORTING
ghost commented 8 years ago

Probably not related. Please open a new issue. Redoing the run with debug infos would also be helpful. (I can try doing the same on nvidia drivers, but not today.)

Gusar321 commented 8 years ago

LIBVA_DRIVER_NAME=gallium works for r600g and maybe nouveau as well.

Not yet for nouveau, but there are patches on the mesa mailing list. Right now radeonsi and r600 work.

AndyFurniss commented 8 years ago

@ LateAdopter Thanks, should have thought of using env.

I played more with this with older mesa - still exists.

Also what I said above about vdpau and vaapi --vo being OK is not correct. They are OK for normal use, but playing 2160p unscaled on 1080p monitor, then grabbing the image and repeatedly moving as fast as possible, will eventually produce similar, though short lived corruption.

I can reproduce this specific test using mplayer.

It only happens with hwdec, all --vo with swdec are OK.

As already noted this seems to be related to not being fast enough. It is just about possible for me using experimental (=with issues) powerplay kernel on my uvd v5 card to play >200mbit 2160p60 x264 with opengl-hq and with cpu/gpu clocks forced high there is no corruption.

Since other radeonsi/r600 cards have working power management and UVD < v5.0 is limited to 1080p I doubt they would see this even if it's there for them (I guess someone could force clocks to low to test).

ghost commented 8 years ago

Could be a race condition, or some sort of buffer starvation. (But I have no clue how the driver works.)