I do not know how to reproduce this bug. However, I may have managed to capture enough information when it happened to make it possible to fix it nevertheless.
--> Maja (~quassel@45.142.146.28) has joined #sway
\ hello, my sway session has hung on me and i am trying to debug why
\ a few minutes ago, it had 207686 file descriptors open. a moment later, it had 415565. I then sent a SIGSTOP to the process
\ it's doing the following sequence of syscalls ad infinitum:
(addendum: immediately afterwards a call to pipe2 follows, and so on)
Long story short, I established that fd 68 is Xwayland, and captured a backtrace from one of the calls to fcntl in the strace'd sequence (a breakpoint for pipe2 wasn't hitting)
\ try to figure out who the client on the other end of fd 68 is
\ I suspect that client is repeating a request forever that is using up an fd
\ that or attach gdb and get a backtrace for what is calling pipe2
\ the client is Xwayland
\ because of course
\ lovely
\ okay, I *really* don't like what I'm seeing https://paste.debian.net/1302857/
[not reproducing that paste inline because it's not too relevant]
\ i'm not sure how to get the symbols to show up here but i know that /usr/lib/dri/radeonsi_dri.so in your backtrace is computer for "this is not a place of honor"
\ it's probably calling the syscall via some other wrapper, lovely
\ or did you actually find a caller of pipe2?
\ well it's the backtrace from a breakpoint on pipe2 being hit
\ wait no
\ sorry, I did a dumb
\ okay yeah, the breakpoint on pipe2 isn't being hit
\ but also, gdb is not showing symbols. when it usually does. probably because i had to run it as root
\ maybe fcntl might be hit
\ not sure *why*, because /proc/sys/kernel/yama/ptrace_scope is already 0
\ ok, that does it
Thread 1 "sway" hit Breakpoint 2, 0x00007f3c471a52f0 in fcntl64 ()
from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007f3c471a52f0 in fcntl64 () at /usr/lib/libc.so.6
#1 0x00007f3c4741eb52 in () at /usr/lib/libwayland-server.so.0 [later resolved as wl_os_dupfd_cloexec.constprop.0]
#2 0x00007f3c474207b9 in wl_event_loop_add_fd () at /usr/lib/libwayland-server.so.0
#3 0x00007f3c473a6c01 in () at /usr/lib/libwlroots.so.11 [later resolved as xwm_selection_transfer_start_outgoing.lto_priv.0]
#4 0x00007f3c473b1651 in () at /usr/lib/libwlroots.so.11 [later resolved as xwm_map_shell_surface]
#5 0x00007f3c47422b8f in wl_event_loop_dispatch ()
at /usr/lib/libwayland-server.so.0
#6 0x00007f3c474232d7 in wl_display_run () at /usr/lib/libwayland-server.so.0
#7 0x0000563e98eb2af5 in ()
#8 0x00007f3c470cdcd0 in () at /usr/lib/libc.so.6
#9 0x00007f3c470cdd8a in __libc_start_main () at /usr/lib/libc.so.6
#10 0x0000563e98eb2fa5 in ()
we deliberated on why the symbols aren't there
\ doesn't look terribly useful to me :<
\ if you can get debug symbols for the wlroots SO that would help point out what needs an FD
\ usually gdb offers "do you want to download symbols from the internet for this session"
\ but apparently it doesn't do that when you run it as root
\ probably helping you be secure :p
\ sway generally doesn't need to run in a less-debuggable context but I think some launchers might inherit that anti-ptrace bit
\ s/launcher/display manager/
\ but doctor. my launcher is "type exec sway in tty1"
\ well, you've confused me. That's how I do it and I can attach strace/gdb.
\ i know! it's weird!
\ also, i tried sigstoping just the xwayland process but the session is still hung, sway stops in poll(68)
\ I think sway survives killing the Xwayland process
\ but that's destructive to X programs, of course
\ ok, i tried the sigstop again, and now it got a different timing and the session itself is alive again
\ now i get to use the good keyboard and monitor instead of ssh'ing in from my laptop :3
\ > destructive to X programs
\ of course
\ but also: i want to debug this now
\ sadly I have no idea how xwyaland works so won't be much help
\ okay, welp, i flew too close to the sun and it got it out of the weird state and can't trigger it again
\ fwiw the specific xwayland client was gimp's save file dialog
\ and it closed all the file descriptors, too!
nevertheless, i managed to decode the backtrace i captured earlier
\ okay, i can still resolve the backtrace manually :p
\ mostly for my reference, here's the memory layout of my sway process https://paste.debian.net/1302863/
\ you can turn that plus the gdb bt into offsets into your .so and then gdb a sway process (try a nested one) to get symbols
\ just re-apply the offset at the new ASLR
\ I've had to do that before when I only had debugsyms on one system and a crash on another
\ oh, i was about to nm | sort | eyeball the thing
\ apparently one of the calls in that backtrace is xwm_selection_transfer_start_outgoing.lto_priv.0. that looks... fun
\ it gets called by xwm_map_shell_surface
\ the immediate caller of fcntl64 is wl_os_dupfd_cloexec.constprop.0
\ ah, clipboard stuff?
\ the hang happened after I fat-fingered some keybind, so, it might be clipboard-related
\ that might be enough for someone who knows wlroots/xwayland to figure out what happened. I suspect a loop of selection-set and paste events or something, resolved by you selecting something while xwayland was paused
I do not know how to reproduce this bug. However, I may have managed to capture enough information when it happened to make it possible to fix it nevertheless.
--> Maja (~quassel@45.142.146.28) has joined #sway \ hello, my sway session has hung on me and i am trying to debug why
\ a few minutes ago, it had 207686 file descriptors open. a moment later, it had 415565. I then sent a SIGSTOP to the process
\ it's doing the following sequence of syscalls ad infinitum:
(addendum: immediately afterwards a call to pipe2 follows, and so on)
Long story short, I established that fd 68 is Xwayland, and captured a backtrace from one of the calls to fcntl in the strace'd sequence (a breakpoint for pipe2 wasn't hitting)
\we deliberated on why the symbols aren't there
\\ also, i tried sigstoping just the xwayland process but the session is still hung, sway stops in poll(68)
\ I think sway survives killing the Xwayland process
\ but that's destructive to X programs, of course
\ ok, i tried the sigstop again, and now it got a different timing and the session itself is alive again
\ now i get to use the good keyboard and monitor instead of ssh'ing in from my laptop :3
\ > destructive to X programs
\ of course
\ but also: i want to debug this now
\ sadly I have no idea how xwyaland works so won't be much help
\ okay, welp, i flew too close to the sun and it got it out of the weird state and can't trigger it again
\ fwiw the specific xwayland client was gimp's save file dialog
\ and it closed all the file descriptors, too!
nevertheless, i managed to decode the backtrace i captured earlier
\\ ah, clipboard stuff?
\ the hang happened after I fat-fingered some keybind, so, it might be clipboard-related
\ that might be enough for someone who knows wlroots/xwayland to figure out what happened. I suspect a loop of selection-set and paste events or something, resolved by you selecting something while xwayland was paused