Closed pitsi closed 1 month ago
Produce a backtrace with gdb after installing the debug symbols.
Realistically it's probably 6797f543782d29561ec7b28163ec82e9a4d79318.
I am sorry but... can you explain in detail how this is done? I have never done it before :/
If it helps, I just tried it on my laptop that runs unstable and it works fine. Same configuration, same file, but unstable is on mesa 24.2.3 and on libllvm19 (testing is on libllvm18).
I am sorry but... can you explain in detail how this is done? I have never done it before :/
Install the debug package for mpv then https://wiki.archlinux.org/title/Debugging/Getting_traces#Getting_the_trace
Unfortunately, the forementioned repo does not contain a -dbg package for mpv and the one on the main repo is for mpv 0.38. https://deb-multimedia.org/pool/main/m/mpv-dmo/
Is the above patch already included in mpv 0.39? If not, I can ask the maintainer for a new package that contains it so as to try. Other than that, will it help if I post my full mpv.conf?
It's unfortunately really tough to say what the cause is since you're on really old hardware
libva: Trying to open /usr/lib/x86_64-linux-gnu/dri/r600_drv_video.so
You can try compiling mpv yourself with mpv-build without the linked commit and see if that solves it. Also see if the coredump has debug symbols with coredumpctl
and opening the coredump with coredumpctl gdb [pid]
Thanks, I will look up that coredumpctl thing. I remember my distro enabling it a few months ago and kodi spamming me with a few hundreds megabyte file every time it crashed, so I disabled it.
About the old hardware. The laptop is almost as old, with a 3rd gen i3 and its onboard vga which uses i915, but it plays with no issues there. That is why I said it may be because of the newer mesa and libllvm of unstable.
If it helps, here is a verbose output of mpv 0.38 on my pc (I downgraded to the one from the main repo) output.txt
and from the laptop with 0.39 (which also comes from deb-multimedia.org) output.txt
More info until I find someone to help me with coredumpctl.
Mpv 0.39 also works fine when commenting out the lines for vo and hwdec, so here is a verbose log. I am no expert, but i do not think it uses vaapi this way. output.txt
I just ran into this same problem. I have [AMD/ATI] Wrestler [Radeon HD 6310]
Realistically it's probably 6797f54.
I ran git bisect and confirmed this commit to be the one to cause the segfault.
And for what's it's worth, 4d09cde8 started logging the following error on my setup:
[vo/gpu/wayland] Unable to set DRM atomic cap: Operation not supported
Can you give a backtrace of that? It might be failing in ffmpeg somewhere.
started logging the following error on my setup
That could happen. But I guess I should drop the logging down to MP_VERBOSE.
Can you give a backtrace of that? It might be failing in ffmpeg somewhere.
I don't have a lot of expertise in this area but would this be helpful:
[Switching to Thread 0x7fffccc006c0 (LWP 202582)]
Downloading source file /usr/src/debug/mesa/build/../mesa-24.2.3/src/gallium/auxiliary/vl/vl_video_buffer.c
0x00007fffe2e46683 in vl_video_buffer_sampler_view_components () at ../mesa-24.2.3/src/gallium/auxiliary/vl/vl_video_buffer.c:297
297 struct pipe_resource *res = buf->resources[plane_order[i]];
...
(gdb) bt
#0 0x00007fffe2e46683 in vl_video_buffer_sampler_view_components () at ../mesa-24.2.3/src/gallium/auxiliary/vl/vl_video_buffer.c:297
#1 0x00007fffe2e2bc42 in set_yuv_layer () at ../mesa-24.2.3/src/gallium/auxiliary/vl/vl_compositor.c:347
#2 0x00007fffe2e2d0c3 in set_yuv_layer () at ../mesa-24.2.3/src/gallium/auxiliary/vl/vl_compositor.c:343
#3 vl_compositor_yuv_deint_full () at ../mesa-24.2.3/src/gallium/auxiliary/vl/vl_compositor.c:714
#4 0x00007fffe2842319 in vlVaExportSurfaceHandle () at ../mesa-24.2.3/src/gallium/frontends/va/surface.c:1643
#5 0x00007ffff44d4251 in vaExportSurfaceHandle (dpy=0x7fffa03f5050, surface_id=3, mem_type=mem_type@entry=1073741824, flags=5, descriptor=descriptor@entry=0x7fffccbfee10) at ../libva/va/va.c:1580
#6 0x00005555556b22bd in try_format_upload (hw=hw@entry=0x7fffa03e5180, pixfmt=<optimized out>) at ../video/out/hwdec/hwdec_vaapi.c:444
#7 0x00005555556bb72c in try_format_config (hw=0x7fffa03e5180, hwconfig=0x7fffa03efe00) at ../video/out/hwdec/hwdec_vaapi.c:504
#8 determine_working_formats (hw=0x7fffa03e5180) at ../video/out/hwdec/hwdec_vaapi.c:572
#9 init (hw=0x7fffa03e5180) at ../video/out/hwdec/hwdec_vaapi.c:177
#10 0x00005555556308f7 in ra_hwdec_load_driver (ra_ctx=<optimized out>, log=0x55555593a5d0, global=<optimized out>, devs=0x7fffa038e530, drv=0x5555557b4b40 <ra_hwdec_vaapi>, is_auto=false) at ../video/out/gpu/hwdec.c:104
#11 load_add_hwdec (ctx=0x7fffa0385fd8, devs=0x7fffa038e530, drv=0x5555557b4b40 <ra_hwdec_vaapi>, is_auto=false) at ../video/out/gpu/hwdec.c:236
#12 load_add_hwdec (ctx=0x7fffa0385fd8, devs=0x7fffa038e530, drv=0x5555557b4b40 <ra_hwdec_vaapi>, is_auto=<optimized out>) at ../video/out/gpu/hwdec.c:226
#13 0x0000555555638903 in ra_hwdec_ctx_load_fmt (ctx=0x7fffa0385fd8, devs=0x7fffa038e530, params=0x7fffffffdf50) at ../video/out/gpu/hwdec.c:332
#14 0x00005555556537c5 in gl_video_load_hwdecs_for_img_fmt (p=<optimized out>, devs=<optimized out>, params=0x7fffffffdf50) at ../video/out/gpu/video.c:4369
#15 request_hwdec_api (vo=0x5555559a09f0, data=0x7fffffffdf50) at ../video/out/vo_gpu.c:134
#16 control (vo=0x5555559a09f0, request=<optimized out>, data=0x7fffffffdf50) at ../video/out/vo_gpu.c:203
#17 0x0000555555648c5e in run_control (p=0x7fffffffdea0) at ../video/out/vo.c:652
#18 0x00005555555c14a2 in mp_dispatch_queue_process (queue=0x5555558cb790, timeout=<optimized out>) at ../misc/dispatch.c:300
#19 0x000055555565208e in vo_thread (ptr=<optimized out>) at ../video/out/vo.c:1100
#20 0x00007ffff432739d in start_thread (arg=<optimized out>) at pthread_create.c:447
#21 0x00007ffff43ac49c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78```
Looks like a segfault in mesa based on that.
@0x0aa As someone who tried a lot for a day and did not make a coredump file in the end, thank you so much!
@Dudemanguy The dmesg part I showed above mentions a mesa library. If it is mesa, it will be the second time in a year that it breaks vaapi :(
Since the commit to blame changes hwdec interop formats, I assume it just exposes a bug in Mesa that has already been there for a while by picking a different format. I think users of this old GPU's codepath might be able to work around it by forcing a different hwdec image format with --hwdec-image-format=
, see --hwdec-image-format=help
for a list of them, if no fix in Mesa itself is forthcoming.
(Though with a $50 card from 13 years ago, you might want to start testing Mesa releases and submitting patches for the driver yourself because I doubt anyone else is doing it on the regular at this point.)
EDIT: Looks like I have a Radeon HD 5450 with an onboard PCIe-to-PCI bridge here, I might convert an old PC into a jank test bench and see if I can reproduce the Mesa crash on that and see how hard it'd be to fix it in Mesa
Since the commit to blame changes hwdec interop formats, I assume it just exposes a bug in Mesa that has already been there for a while
Yes, this is most likely the case.
(Though with a $50 card from 13 years ago, you might want to start testing Mesa releases and submitting patches for the driver yourself because I doubt anyone else is doing it on the regular at this point.)
That's reasonable.
In my case this is an igpu on an old mini laptop, which is kind of funny to play with. I am currently compiling mesa and trying to see, if I can't spot anything obvious (which I probably won't, as I know pretty much nothing about gpu drivers and media codecs). Maybe I'll learn something on the way though :)
Am in entirely wrong if I suspect the bug to be one these: a) mesa is incorrectly presenting the capabilities of the gpu, mpv uses that correctly b) mesa is correctly presenting the capabilities of the gpu, mpv uses that correctly c) mesa is correctly presenting the capabilities of the gpu, mpv uses that incorrectly
And it is like a) or b) and confidently not c) ?
None of those really? It's more like mpv is trying to probe something and mesa blows up as a result when it shouldn't.
@0x0aa May I ask what mesa version are you on? Because the changelog of 24.2.3 mentions something about vaapi. https://docs.mesa3d.org/relnotes/24.2.3.html
Yes, that’s the version I had installed.
Today I compiled the main branch and sadly it segfaults too. I’ll try debugging later. Don’t get your hopes up though…
With https://github.com/mpv-player/mpv/commit/6797f543782d29561ec7b28163ec82e9a4d79318 plus the change in f_hwtransfer.c reverted, hardware decoding with vaapi works again on radeon.
vo_dmabuf_wayland stays broken, due to https://github.com/mpv-player/mpv/commit/4d09cde8f92577fc6d8522a0e14db2e238a6c3a8, which gets strange format info. No segfault, but dysfunctional.
[vo/dmabuf-wayland/wayland] Unable to set DRM atomic cap: Operation not supported
[hwupload] no support for this hw format
[hwupload] hardware format not supported
[autoconvert] HW-uploading to drm_prime
[hwupload] upload yuv420p -> drm_prime[yuv420p]
[hwupload] failed to upload frame
Cannot convert decoder/filter output to any format supported by the output.
My impression: Format checks hardly functional with radeon driver. Blacklisting?
First of all, I acknowledge this should probably be taken to mesa, and I apologize if I am being annoying here.
So I debugged a bit and I was able to "bypass" the segfault by dodging the null pointer dereferences with the following changes to mesa:
index c0a9e625f65..d832df06649 100644
--- a/src/gallium/auxiliary/vl/vl_compositor.c
+++ b/src/gallium/auxiliary/vl/vl_compositor.c
@@ -777,6 +777,9 @@ vl_compositor_render(struct vl_compositor_state *s,
{
assert(s);
+ if (dst_surface == NULL)
+ return;
+
if (s->layers->cs)
vl_compositor_cs_render(s, c, dst_surface, dirty_area, clear_dirty);
else if (s->layers->fs)
index 7c10a0c0d6c..fdf26065309 100644
--- a/src/gallium/auxiliary/vl/vl_video_buffer.c
+++ b/src/gallium/auxiliary/vl/vl_video_buffer.c
@@ -80,10 +80,12 @@ vl_video_buffer_plane_order(enum pipe_format format)
case PIPE_FORMAT_NV21:
case PIPE_FORMAT_Y8_U8_V8_444_UNORM:
case PIPE_FORMAT_Y8_U8_V8_440_UNORM:
+ case PIPE_FORMAT_Y8_400_UNORM:
case PIPE_FORMAT_R8G8B8A8_UNORM:
case PIPE_FORMAT_R8G8B8X8_UNORM:
case PIPE_FORMAT_B8G8R8A8_UNORM:
case PIPE_FORMAT_B8G8R8X8_UNORM:
+ case PIPE_FORMAT_A8R8G8B8_UNORM:
case PIPE_FORMAT_R10G10B10A2_UNORM:
case PIPE_FORMAT_R10G10B10X2_UNORM:
case PIPE_FORMAT_B10G10R10A2_UNORM:
The code is quite... ehm... vast, so I have no idea whether this is a "fix" or if these null pointers are simply caused by missing support.
Thoughts about this?
Well I am not a mesa developer so I cannot tell you if the fix is right or not but I mean it looks sane. The best thing to do is to just open up an MR upstream and get feedback. In my experience, they quickly review and accept small fixes like that.
Thanks. I researched a bit more and I’m now confident that the actual fix is more involved. Anyway, it’s quite interesting but probably way over my head. I’ll send the MR to mesa if I ever get there.
I just got the upgrade to mesa 24.2.4 today but nothing changed, so 24.2.2 was not actually to blame, but mesa in general is. I thought that moving to radeon from nvidia (340!) would make my linux experience better, but this is the second time mesa breaks vaapi in a year!
[ 49.749213] vo[821]: segfault at 0 ip 00007f232dcb85fa sp 00007f2303dfeb30 error 4 in libgallium-24.2.4-1.so[6b85fa,7f232d6ac000+1623000] likely on CPU 1 (core 1, socket 0)
Semi-offtopic question. What do I lose in terms of performance if I leave the vo and hwdec (and possibly ao) parameters commented out in mpv.conf?
mesa breaks vaapi in a year
No, it has most likely always been broken. mpv is just triggering the broken codepath now, when it used to not do so before. We may have to disable the aggressive probing or figure out a workaround because gpu drivers are completely ass, though.
What do I lose in terms of performance if I leave the vo and hwdec (and possibly ao) parameters commented out in mpv.conf?
Just disabling hwdec should be enough to not hit this bug. What that means is that the video will be decoded on your cpu instead of fixed function hardware on your gpu. This is less efficient, and if your cpu is too old it might not be able to keep up. In general, hwdec or no hwdec shouldn't be noticeable in any way except slightly reduced battery life though.
Just disabling hwdec should be enough to not hit this bug. What that means is that the video will be decoded on your cpu instead of fixed function hardware on your gpu. This is less efficient, and if your cpu is too old it might not be able to keep up. In general, hwdec or no hwdec shouldn't be noticeable in any way except slightly reduced battery life though.
I thought that video would be decoded on the cpu when using the xv video output only. In fact, this is what I had to use last January that mesa broke vaapi and was giving a black screen on all players, as described here. I could not play anything above 480p without the cpu hitting 100% usage and I had to wait for months and mesa 24.0.x to be released from upstream and reach testing in order to fix what mesa 23.3.x broke! https://forums.debian.net/viewtopic.php?t=157834
Right now though, with hwdec commented out, I get these while playing any video and no significant cpu usage, i.e. 25-30% on both cores for 720p and 45-50% on 1080p. I know it is not ideal, because if the gpu was doing all the work those numbers would be a lot lower, but it is better than xv.
VO: [gpu] 800x600 yuv420p
Semi-offtopic. I had also enabled vaapi in firefox for video playback. In a period of 3 months it froze the browser once, so I had to force kill it, and my system twice, so I had to reisub it. I disabled it again the second time it froze the entire system.
You're on a 10 year old $50 gpu, you're kind of on your own there.
I agree on what you say. But before that, and for many years, I was on an even older nvidia gpu that needed nvidia 340 (driver that went eol in the end of 2019) to work and I never had issues with vdpau on any player. And no, nouveau is out of the question because it can not do even basic everyday stuff like powersaving.
I am on this ati and radeon since early September 2023, because the forementioned nvidia died after a blackout.
Nevermind the fact that you bought a low-end GPU from 2011 in 2023, why are you telling us all this?
We're not customer support. There's no way to speak to the manager here. Go fix the driver yourself if you care to, the source code is available. Any resolution to the issue provided on mpv's side of things is not accelerated by your moping.
I did not buy it, I had it in my drawer for more than 5 years. As an nvidia user all those (15+) years in linux hearing how good and troublefree the opensource driver(s) for amd's gpus is (are), I put it to work on the first chance I had.
@pitsi, I understand you are frustrated.
If you want to help, you can report this bug to mesa. There’s a good chance to get it fixed, when the right people look at it.
Thank you @kasper93! 😊
Tested from master. OK on radeon/r600. Also vo_dmabuf_wayland works again.
mpv Information
Other Information
Reproduction Steps
On my end, mpv 0.39 can not play any video or stream with vaapi. It crashes with a segfault, blaming a mesa library in dmesg. I have these lines on my mpv.conf since ever
Setting hwdec to auto, as a friend suggested, changes nothing. It just adds a line about missing libcuda.
However, launching it with the --no-config parameter makes it work as it should, but then I lose any customization I have. And the same happens when using xv as the video output.
I will also test it on my laptop, which has an intel gpu, debian unstable x64 and identical configuration, tomorrow.
Expected Behavior
Mpv should just play the file like so. Please ignore the lines about the missing pipewire configuration, I do not even have it installed.
Actual Behavior
It just segfaults with hwdec=vaapi
It also segfaults when setting hwdec=auto
But it says the same on dmesg on both cases.
[10192.176633] vo[8989]: segfault at 0 ip 00007f57639387ba sp 00007f5741bfeb30 error 4 in libgallium-24.2.2-1.so[d387ba,7f5762cab000+15da000] likely on CPU 1 (core 1, socket 0)
Also note that ALL packages providing libcuda.so.1 are nvidia related, i.e. installing any of them will also install nvidia's driver, so I did not (and will not) install any of them.
Log File
output.txt
Sample Files
No response
I carefully read all instruction and confirm that I did the following:
--log-file=output.txt
.