swaywm / sway

i3-compatible Wayland compositor
https://swaywm.org
MIT License
14.7k stars 1.11k forks source link

Crash after running `mpv` #8325

Open matheusmoreira opened 2 months ago

matheusmoreira commented 2 months ago

Please fill out the following:


I will try to reproduce the issue with full debugging log and using a default configuration later tonight by running:

sway --config /dev/null --debug 2> ~/sway.log

Will update this issue with the details if successful.

matheusmoreira commented 2 months ago

Reproduced with default settings.

sway --config /etc/sway/config --debug 2> /run/user/1000/sway.debug.log

Downloaded lots of random videos and started opening all of them with mpv until it crashed.

sway.debug.log

$ tail -n9 /run/user/1000/sway.debug.log
00:09:33.449 [INFO] [wlr] [xwayland/server.c:108] Starting Xwayland on :1
(EE) could not connect to wayland server
 err: 00:09:31.819 wayland.c:1490: failed to read events from the Wayland socket[ERROR] [swaybar/bar.c:466] Wayland display poll error: Broken pipe

 err: wayland.c:2069: failed to roundtrip Wayland display: Broken pipe
 err: wayland.c:2069: failed to roundtrip Wayland display: Broken pipe
 err: wayland.c:2069: failed to roundtrip Wayland display: Broken pipe
warn: terminal.c:1928: slave exited with signal 1 (Hangup)
 err: wayland.c:2034: failed to flush wayland socket: Broken pipe
matheusmoreira commented 2 months ago

Since the segmentation violation occurs inside wayland code, I have also created an issue on the wayland library issue tracker.

Nefsen402 commented 2 months ago

This looks like a memory corruption issue. These things are notoriously hard to track down but there are steps you can take to make it easier. Please try running sway using an address sanitizer. This will validate that all memory accesses sway attempts are valid and so it doesn't corrupt itself along with any chance of debugging it.

matheusmoreira commented 2 months ago

@Nefsen402 OK. I've just finished building sway and wlroots from git with address and undefined behavior sanitizers enabled. Should be able to fully test and follow up with the results either tomorrow or this weekend.

matheusmoreira commented 2 months ago

I was not able to reproduce the problem while running a sway compiled with address and undefined behavior sanitizers enabled.

export ASAN_OPTIONS=abort_on_error=1:disable_coredump=0:unmap_shadow_on_exit=1

build/sway/sway \
    --debug 2> /run/user/1000/sway.debug.log

Spent quite a bit of time trying to trigger it but no crash.

Nefsen402 commented 2 months ago

The crash maybe fixed on sway master and exclusive to version 1.9. I would recommend you keep running a snapshot of git that you are happy with until a real release happens.

Nefsen402 commented 1 month ago

Closing as 1.10-rc1 has been released. 1.9 branchpoint will no longer see any development.

matheusmoreira commented 2 weeks ago

Reproduced in sway 1.10.

matheusmoreira commented 2 weeks ago

sway-1.10.debug.log

Core dump seems to be completely useless this time around:

$ gdb /usr/bin/sway sway.core
Reading symbols from /usr/bin/sway...

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
...
Core was generated by `sway'.
Program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt full
#0  0x000070fe4d407110 in ?? ()
No symbol table info available.
#1  0x0000000000000001 in ?? ()
No symbol table info available.
#2  0x0000000018b89e44 in ?? ()
No symbol table info available.
#3  0x000000000007d8ac in ?? ()
No symbol table info available.
#4  0x0000631a52fa41d0 in ?? ()
No symbol table info available.
#5  0x0000000000000000 in ?? ()
No symbol table info available.
matheusmoreira commented 2 weeks ago

Reproduced with a different application: zbarcam. Same corrupted ep and source variables. For some reason, firefox doesn't seem to trigger this no matter how many windows I open.

matheusmoreira commented 16 hours ago

Seems to be related to xwayland. Nagefire from #archlinux pointed out that the crash was happening just after XWayland was started. Initially it made no sense since mpv has native wayland support. However, I confirmed that zbarcam is run under xwayland and so is mpv when a file is opened via xdg-open.

Disabling xwayland support might mitigate the issue but if so it means there's a bug in xwayland.

matheusmoreira commented 15 hours ago

I have confirmed that for some obscure reason both xdg-open and xdg-mime start an xwayland process, causing mpv to be run through xwayland. Replacing xdg-open with an alternative might mitigate these crashes when launching mpv from other applications.