mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
28.62k stars 2.92k forks source link

Segfault with mpv 0.39, vaapi and radeon #14956

Closed pitsi closed 1 month ago

pitsi commented 1 month ago

mpv Information

$ mpv --version 
mpv 0.39.0 Copyright © 2000-2024 mpv/MPlayer/mplayer2 projects
libplacebo version: v7.349.0
FFmpeg version: 7.0.2
FFmpeg library versions:
   libavcodec      61.3.100
   libavdevice     61.1.100
   libavfilter     10.1.100
   libavformat     61.1.100
   libavutil       59.8.100
   libswresample   5.1.100
   libswscale      8.1.100

Other Information

- Linux version: "Debian GNU/Linux trixie/sid"
- Kernel Version: Linux (removed) 6.10.11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1 (2024-09-22) x86_64 GNU/Linux
- GPU Model: 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
- Mesa/GPU Driver Version: 4.5 (Compatibility Profile) Mesa 24.2.2-1 
- Window Manager and Version: Openbox 3.6.1
- Source mpv: deb-multimedia.org
- Introduced in version: 0.39

Reproduction Steps

On my end, mpv 0.39 can not play any video or stream with vaapi. It crashes with a segfault, blaming a mesa library in dmesg. I have these lines on my mpv.conf since ever

$ cat .config/mpv/mpv.conf 
vo=gpu,xv
hwdec=vaapi
ao=alsa

Setting hwdec to auto, as a friend suggested, changes nothing. It just adds a line about missing libcuda.

However, launching it with the --no-config parameter makes it work as it should, but then I lose any customization I have. And the same happens when using xv as the video output.

I will also test it on my laptop, which has an intel gpu, debian unstable x64 and identical configuration, tomorrow.

Expected Behavior

Mpv should just play the file like so. Please ignore the lines about the missing pipewire configuration, I do not even have it installed.

$ mpv --no-config klama.mp4 
● Video  --vid=1  (h264 800x600 29.97 fps) [default]
● Audio  --aid=1  (aac 1ch 44100 Hz 48 kbps) [default]
File tags:
 Title: 1629350087352316
[W][10359.652810] pw.conf      | [          conf.c: 1214 try_load_conf()] can't load config client-rt.conf: No such file or directory
[E][10359.653076] pw.conf      | [          conf.c: 1243 pw_conf_load_conf_for_context()] can't load config client-rt.conf: No such file or directory
AO: [alsa] 48000Hz stereo 2ch float
VO: [gpu] 800x600 yuv420p
AV: 00:00:02 / 00:02:38 (2%) A-V: -0.000
Exiting... (Quit)

Actual Behavior

It just segfaults with hwdec=vaapi

$ mpv klama.mp4 
● Video  --vid=1  (h264 800x600 29.97 fps) [default]
● Audio  --aid=1  (aac 1ch 44100 Hz 48 kbps) [default]
File tags:
 Title: 1629350087352316
Segmentation fault (core dumped)

It also segfaults when setting hwdec=auto

$ mpv klama.mp4 
● Video  --vid=1  (h264 800x600 29.97 fps) [default]
● Audio  --aid=1  (aac 1ch 44100 Hz 48 kbps) [default]
File tags:
 Title: 1629350087352316
Cannot load libcuda.so.1
Segmentation fault (core dumped)

But it says the same on dmesg on both cases. [10192.176633] vo[8989]: segfault at 0 ip 00007f57639387ba sp 00007f5741bfeb30 error 4 in libgallium-24.2.2-1.so[d387ba,7f5762cab000+15da000] likely on CPU 1 (core 1, socket 0)

Also note that ALL packages providing libcuda.so.1 are nvidia related, i.e. installing any of them will also install nvidia's driver, so I did not (and will not) install any of them.

$ apt-cache search libcuda
libcudart12 - NVIDIA CUDA Runtime Library
libcuda1 - NVIDIA CUDA Driver Library
libcudadebugger1 - NVIDIA CUDA Debugger Library
libnvidia-tesla-470-cuda1 - NVIDIA CUDA Driver Library (Tesla 470 version)
libnvidia-legacy-340xx-cuda1 - NVIDIA CUDA Driver Library (340xx legacy version)
libnvidia-legacy-390xx-cuda1 - NVIDIA CUDA Driver Library (390xx legacy version)
libnvidia-tesla-418-cuda1 - NVIDIA CUDA Driver Library (Tesla 418 version)

Log File

output.txt

Sample Files

No response

I carefully read all instruction and confirm that I did the following:

CounterPillow commented 1 month ago

Produce a backtrace with gdb after installing the debug symbols.

Dudemanguy commented 1 month ago

Realistically it's probably 6797f543782d29561ec7b28163ec82e9a4d79318.

pitsi commented 1 month ago

I am sorry but... can you explain in detail how this is done? I have never done it before :/

If it helps, I just tried it on my laptop that runs unstable and it works fine. Same configuration, same file, but unstable is on mesa 24.2.3 and on libllvm19 (testing is on libllvm18).

llyyr commented 1 month ago

I am sorry but... can you explain in detail how this is done? I have never done it before :/

Install the debug package for mpv then https://wiki.archlinux.org/title/Debugging/Getting_traces#Getting_the_trace

pitsi commented 1 month ago

Unfortunately, the forementioned repo does not contain a -dbg package for mpv and the one on the main repo is for mpv 0.38. https://deb-multimedia.org/pool/main/m/mpv-dmo/

Is the above patch already included in mpv 0.39? If not, I can ask the maintainer for a new package that contains it so as to try. Other than that, will it help if I post my full mpv.conf?

llyyr commented 1 month ago

It's unfortunately really tough to say what the cause is since you're on really old hardware

libva: Trying to open /usr/lib/x86_64-linux-gnu/dri/r600_drv_video.so

You can try compiling mpv yourself with mpv-build without the linked commit and see if that solves it. Also see if the coredump has debug symbols with coredumpctl and opening the coredump with coredumpctl gdb [pid]

pitsi commented 1 month ago

Thanks, I will look up that coredumpctl thing. I remember my distro enabling it a few months ago and kodi spamming me with a few hundreds megabyte file every time it crashed, so I disabled it.

About the old hardware. The laptop is almost as old, with a 3rd gen i3 and its onboard vga which uses i915, but it plays with no issues there. That is why I said it may be because of the newer mesa and libllvm of unstable.

pitsi commented 1 month ago

If it helps, here is a verbose output of mpv 0.38 on my pc (I downgraded to the one from the main repo) output.txt

and from the laptop with 0.39 (which also comes from deb-multimedia.org) output.txt

pitsi commented 1 month ago

More info until I find someone to help me with coredumpctl.

Mpv 0.39 also works fine when commenting out the lines for vo and hwdec, so here is a verbose log. I am no expert, but i do not think it uses vaapi this way. output.txt

0x0aa commented 1 month ago

I just ran into this same problem. I have [AMD/ATI] Wrestler [Radeon HD 6310]

Realistically it's probably 6797f54.

I ran git bisect and confirmed this commit to be the one to cause the segfault.

And for what's it's worth, 4d09cde8 started logging the following error on my setup:

[vo/gpu/wayland] Unable to set DRM atomic cap: Operation not supported

Dudemanguy commented 1 month ago

Can you give a backtrace of that? It might be failing in ffmpeg somewhere.

started logging the following error on my setup

That could happen. But I guess I should drop the logging down to MP_VERBOSE.

0x0aa commented 1 month ago

Can you give a backtrace of that? It might be failing in ffmpeg somewhere.

I don't have a lot of expertise in this area but would this be helpful:


[Switching to Thread 0x7fffccc006c0 (LWP 202582)]
Downloading source file /usr/src/debug/mesa/build/../mesa-24.2.3/src/gallium/auxiliary/vl/vl_video_buffer.c
0x00007fffe2e46683 in vl_video_buffer_sampler_view_components () at ../mesa-24.2.3/src/gallium/auxiliary/vl/vl_video_buffer.c:297                                                                 
297      struct pipe_resource *res = buf->resources[plane_order[i]];

...

(gdb) bt
#0  0x00007fffe2e46683 in vl_video_buffer_sampler_view_components () at ../mesa-24.2.3/src/gallium/auxiliary/vl/vl_video_buffer.c:297
#1  0x00007fffe2e2bc42 in set_yuv_layer () at ../mesa-24.2.3/src/gallium/auxiliary/vl/vl_compositor.c:347
#2  0x00007fffe2e2d0c3 in set_yuv_layer () at ../mesa-24.2.3/src/gallium/auxiliary/vl/vl_compositor.c:343
#3  vl_compositor_yuv_deint_full () at ../mesa-24.2.3/src/gallium/auxiliary/vl/vl_compositor.c:714
#4  0x00007fffe2842319 in vlVaExportSurfaceHandle () at ../mesa-24.2.3/src/gallium/frontends/va/surface.c:1643
#5  0x00007ffff44d4251 in vaExportSurfaceHandle (dpy=0x7fffa03f5050, surface_id=3, mem_type=mem_type@entry=1073741824, flags=5, descriptor=descriptor@entry=0x7fffccbfee10) at ../libva/va/va.c:1580
#6  0x00005555556b22bd in try_format_upload (hw=hw@entry=0x7fffa03e5180, pixfmt=<optimized out>) at ../video/out/hwdec/hwdec_vaapi.c:444
#7  0x00005555556bb72c in try_format_config (hw=0x7fffa03e5180, hwconfig=0x7fffa03efe00) at ../video/out/hwdec/hwdec_vaapi.c:504
#8  determine_working_formats (hw=0x7fffa03e5180) at ../video/out/hwdec/hwdec_vaapi.c:572
#9  init (hw=0x7fffa03e5180) at ../video/out/hwdec/hwdec_vaapi.c:177
#10 0x00005555556308f7 in ra_hwdec_load_driver (ra_ctx=<optimized out>, log=0x55555593a5d0, global=<optimized out>, devs=0x7fffa038e530, drv=0x5555557b4b40 <ra_hwdec_vaapi>, is_auto=false) at ../video/out/gpu/hwdec.c:104
#11 load_add_hwdec (ctx=0x7fffa0385fd8, devs=0x7fffa038e530, drv=0x5555557b4b40 <ra_hwdec_vaapi>, is_auto=false) at ../video/out/gpu/hwdec.c:236
#12 load_add_hwdec (ctx=0x7fffa0385fd8, devs=0x7fffa038e530, drv=0x5555557b4b40 <ra_hwdec_vaapi>, is_auto=<optimized out>) at ../video/out/gpu/hwdec.c:226
#13 0x0000555555638903 in ra_hwdec_ctx_load_fmt (ctx=0x7fffa0385fd8, devs=0x7fffa038e530, params=0x7fffffffdf50) at ../video/out/gpu/hwdec.c:332
#14 0x00005555556537c5 in gl_video_load_hwdecs_for_img_fmt (p=<optimized out>, devs=<optimized out>, params=0x7fffffffdf50) at ../video/out/gpu/video.c:4369
#15 request_hwdec_api (vo=0x5555559a09f0, data=0x7fffffffdf50) at ../video/out/vo_gpu.c:134
#16 control (vo=0x5555559a09f0, request=<optimized out>, data=0x7fffffffdf50) at ../video/out/vo_gpu.c:203
#17 0x0000555555648c5e in run_control (p=0x7fffffffdea0) at ../video/out/vo.c:652
#18 0x00005555555c14a2 in mp_dispatch_queue_process (queue=0x5555558cb790, timeout=<optimized out>) at ../misc/dispatch.c:300
#19 0x000055555565208e in vo_thread (ptr=<optimized out>) at ../video/out/vo.c:1100
#20 0x00007ffff432739d in start_thread (arg=<optimized out>) at pthread_create.c:447
#21 0x00007ffff43ac49c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78```
Dudemanguy commented 1 month ago

Looks like a segfault in mesa based on that.

pitsi commented 1 month ago

@0x0aa As someone who tried a lot for a day and did not make a coredump file in the end, thank you so much!

@Dudemanguy The dmesg part I showed above mentions a mesa library. If it is mesa, it will be the second time in a year that it breaks vaapi :(

CounterPillow commented 1 month ago

Since the commit to blame changes hwdec interop formats, I assume it just exposes a bug in Mesa that has already been there for a while by picking a different format. I think users of this old GPU's codepath might be able to work around it by forcing a different hwdec image format with --hwdec-image-format=, see --hwdec-image-format=help for a list of them, if no fix in Mesa itself is forthcoming.

(Though with a $50 card from 13 years ago, you might want to start testing Mesa releases and submitting patches for the driver yourself because I doubt anyone else is doing it on the regular at this point.)

EDIT: Looks like I have a Radeon HD 5450 with an onboard PCIe-to-PCI bridge here, I might convert an old PC into a jank test bench and see if I can reproduce the Mesa crash on that and see how hard it'd be to fix it in Mesa

0x0aa commented 1 month ago

Since the commit to blame changes hwdec interop formats, I assume it just exposes a bug in Mesa that has already been there for a while

Yes, this is most likely the case.

(Though with a $50 card from 13 years ago, you might want to start testing Mesa releases and submitting patches for the driver yourself because I doubt anyone else is doing it on the regular at this point.)

That's reasonable.

In my case this is an igpu on an old mini laptop, which is kind of funny to play with. I am currently compiling mesa and trying to see, if I can't spot anything obvious (which I probably won't, as I know pretty much nothing about gpu drivers and media codecs). Maybe I'll learn something on the way though :)

Am in entirely wrong if I suspect the bug to be one these: a) mesa is incorrectly presenting the capabilities of the gpu, mpv uses that correctly b) mesa is correctly presenting the capabilities of the gpu, mpv uses that correctly c) mesa is correctly presenting the capabilities of the gpu, mpv uses that incorrectly

And it is like a) or b) and confidently not c) ?

Dudemanguy commented 1 month ago

None of those really? It's more like mpv is trying to probe something and mesa blows up as a result when it shouldn't.

pitsi commented 1 month ago

@0x0aa May I ask what mesa version are you on? Because the changelog of 24.2.3 mentions something about vaapi. https://docs.mesa3d.org/relnotes/24.2.3.html

0x0aa commented 1 month ago

Yes, that’s the version I had installed.

Today I compiled the main branch and sadly it segfaults too. I’ll try debugging later. Don’t get your hopes up though…

samoht0 commented 1 month ago

With https://github.com/mpv-player/mpv/commit/6797f543782d29561ec7b28163ec82e9a4d79318 plus the change in f_hwtransfer.c reverted, hardware decoding with vaapi works again on radeon.

vo_dmabuf_wayland stays broken, due to https://github.com/mpv-player/mpv/commit/4d09cde8f92577fc6d8522a0e14db2e238a6c3a8, which gets strange format info. No segfault, but dysfunctional.

[vo/dmabuf-wayland/wayland] Unable to set DRM atomic cap: Operation not supported
[hwupload] no support for this hw format
[hwupload] hardware format not supported
[autoconvert] HW-uploading to drm_prime
[hwupload] upload yuv420p -> drm_prime[yuv420p]
[hwupload] failed to upload frame
Cannot convert decoder/filter output to any format supported by the output.

My impression: Format checks hardly functional with radeon driver. Blacklisting?

0x0aa commented 1 month ago

First of all, I acknowledge this should probably be taken to mesa, and I apologize if I am being annoying here.

So I debugged a bit and I was able to "bypass" the segfault by dodging the null pointer dereferences with the following changes to mesa:

index c0a9e625f65..d832df06649 100644
--- a/src/gallium/auxiliary/vl/vl_compositor.c
+++ b/src/gallium/auxiliary/vl/vl_compositor.c
@@ -777,6 +777,9 @@ vl_compositor_render(struct vl_compositor_state *s,
 {
    assert(s);

+   if (dst_surface == NULL)
+      return;
+
    if (s->layers->cs)
       vl_compositor_cs_render(s, c, dst_surface, dirty_area, clear_dirty);
    else if (s->layers->fs)
index 7c10a0c0d6c..fdf26065309 100644
--- a/src/gallium/auxiliary/vl/vl_video_buffer.c
+++ b/src/gallium/auxiliary/vl/vl_video_buffer.c
@@ -80,10 +80,12 @@ vl_video_buffer_plane_order(enum pipe_format format)
    case PIPE_FORMAT_NV21:
    case PIPE_FORMAT_Y8_U8_V8_444_UNORM:
    case PIPE_FORMAT_Y8_U8_V8_440_UNORM:
+   case PIPE_FORMAT_Y8_400_UNORM:
    case PIPE_FORMAT_R8G8B8A8_UNORM:
    case PIPE_FORMAT_R8G8B8X8_UNORM:
    case PIPE_FORMAT_B8G8R8A8_UNORM:
    case PIPE_FORMAT_B8G8R8X8_UNORM:
+   case PIPE_FORMAT_A8R8G8B8_UNORM:
    case PIPE_FORMAT_R10G10B10A2_UNORM:
    case PIPE_FORMAT_R10G10B10X2_UNORM:
    case PIPE_FORMAT_B10G10R10A2_UNORM:

The code is quite... ehm... vast, so I have no idea whether this is a "fix" or if these null pointers are simply caused by missing support.

Thoughts about this?

Dudemanguy commented 1 month ago

Well I am not a mesa developer so I cannot tell you if the fix is right or not but I mean it looks sane. The best thing to do is to just open up an MR upstream and get feedback. In my experience, they quickly review and accept small fixes like that.

0x0aa commented 1 month ago

Thanks. I researched a bit more and I’m now confident that the actual fix is more involved. Anyway, it’s quite interesting but probably way over my head. I’ll send the MR to mesa if I ever get there.

pitsi commented 1 month ago

I just got the upgrade to mesa 24.2.4 today but nothing changed, so 24.2.2 was not actually to blame, but mesa in general is. I thought that moving to radeon from nvidia (340!) would make my linux experience better, but this is the second time mesa breaks vaapi in a year! [ 49.749213] vo[821]: segfault at 0 ip 00007f232dcb85fa sp 00007f2303dfeb30 error 4 in libgallium-24.2.4-1.so[6b85fa,7f232d6ac000+1623000] likely on CPU 1 (core 1, socket 0)

Semi-offtopic question. What do I lose in terms of performance if I leave the vo and hwdec (and possibly ao) parameters commented out in mpv.conf?

llyyr commented 1 month ago

mesa breaks vaapi in a year

No, it has most likely always been broken. mpv is just triggering the broken codepath now, when it used to not do so before. We may have to disable the aggressive probing or figure out a workaround because gpu drivers are completely ass, though.

What do I lose in terms of performance if I leave the vo and hwdec (and possibly ao) parameters commented out in mpv.conf?

Just disabling hwdec should be enough to not hit this bug. What that means is that the video will be decoded on your cpu instead of fixed function hardware on your gpu. This is less efficient, and if your cpu is too old it might not be able to keep up. In general, hwdec or no hwdec shouldn't be noticeable in any way except slightly reduced battery life though.

pitsi commented 1 month ago

Just disabling hwdec should be enough to not hit this bug. What that means is that the video will be decoded on your cpu instead of fixed function hardware on your gpu. This is less efficient, and if your cpu is too old it might not be able to keep up. In general, hwdec or no hwdec shouldn't be noticeable in any way except slightly reduced battery life though.

I thought that video would be decoded on the cpu when using the xv video output only. In fact, this is what I had to use last January that mesa broke vaapi and was giving a black screen on all players, as described here. I could not play anything above 480p without the cpu hitting 100% usage and I had to wait for months and mesa 24.0.x to be released from upstream and reach testing in order to fix what mesa 23.3.x broke! https://forums.debian.net/viewtopic.php?t=157834

Right now though, with hwdec commented out, I get these while playing any video and no significant cpu usage, i.e. 25-30% on both cores for 720p and 45-50% on 1080p. I know it is not ideal, because if the gpu was doing all the work those numbers would be a lot lower, but it is better than xv. VO: [gpu] 800x600 yuv420p

Semi-offtopic. I had also enabled vaapi in firefox for video playback. In a period of 3 months it froze the browser once, so I had to force kill it, and my system twice, so I had to reisub it. I disabled it again the second time it froze the entire system.

llyyr commented 1 month ago

You're on a 10 year old $50 gpu, you're kind of on your own there.

pitsi commented 1 month ago

I agree on what you say. But before that, and for many years, I was on an even older nvidia gpu that needed nvidia 340 (driver that went eol in the end of 2019) to work and I never had issues with vdpau on any player. And no, nouveau is out of the question because it can not do even basic everyday stuff like powersaving.

I am on this ati and radeon since early September 2023, because the forementioned nvidia died after a blackout.

CounterPillow commented 1 month ago

Nevermind the fact that you bought a low-end GPU from 2011 in 2023, why are you telling us all this?

We're not customer support. There's no way to speak to the manager here. Go fix the driver yourself if you care to, the source code is available. Any resolution to the issue provided on mpv's side of things is not accelerated by your moping.

pitsi commented 1 month ago

I did not buy it, I had it in my drawer for more than 5 years. As an nvidia user all those (15+) years in linux hearing how good and troublefree the opensource driver(s) for amd's gpus is (are), I put it to work on the first chance I had.

0x0aa commented 1 month ago

@pitsi, I understand you are frustrated.

If you want to help, you can report this bug to mesa. There’s a good chance to get it fixed, when the right people look at it.

0x0aa commented 1 month ago

Thank you @kasper93! 😊

samoht0 commented 1 month ago

Tested from master. OK on radeon/r600. Also vo_dmabuf_wayland works again.