Closed sodface closed 1 year ago
Hi sodface
this certainly sounds worrying. However, we just tried to reproduce this bug on two independent machines and were unable to do so.
Could you please state any available details on your setup to enable reproducibility on our side (flags used for compiling cagebreak, OS, exact command which was run, environment (e.g. TTY or nested within a wayland session), etc.)?
cheers project-repo
Hi project-repo, I just did a fresh install on the same hardware. Some details:
I'm starting cagebreak from TTY. Contents of .profile:
#!/bin/sh
mkdir -m 0700 -p /tmp/user/$(id -u)
export GDK_BACKEND=wayland
export HISTSIZE=131172 # 128K lines
export TERMINAL=sakura
export XDG_RUNTIME_DIR=/tmp/user/$(id -u)
export MOZ_ENABLE_WAYLAND=1
export EGL_PLATFORM=wayland
if [ -z "$DISPLAY" ] && [ "$(tty)" = "/dev/tty1" ]; then
exec dbus-run-session -- cagebreak -e
fi
The screen recording shows my second monitor with 1 split, htop above and then connecting to IPC with busybox nc below. cagebreak -e
goes to 100% until I move my mouse from the bottom tile to the top causing an event, at which time CPU usage drops off.
a perf top
for the process looks something like
PerfTop: 3874 irqs/sec kernel:49.6% exact: 99.9% lost: 0/0 drop: 0/0 [4000Hz cycles], (target_pid: 18082)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
7.83% [kernel] [k] __put_user_nocheck_4
6.37% libwayland-server.so.0.22.0 [.] wl_event_loop_dispatch
5.81% [kernel] [k] do_epoll_wait
5.67% [kernel] [k] __put_user_nocheck_8
4.57% libwayland-server.so.0.22.0 [.] 0x0000000000005858
4.55% libwayland-client.so.0.22.0 [.] 0x0000000000003fbf
4.07% [kernel] [k] __entry_text_start
3.55% ld-musl-x86_64.so.1 [.] __syscall_cp_c
3.03% [kernel] [k] syscall_enter_from_user_mode
2.67% ld-musl-x86_64.so.1 [.] __pthread_mutex_lock
2.53% ld-musl-x86_64.so.1 [.] __pthread_mutex_unlock
2.50% [kernel] [k] entry_SYSRETQ_unsafe_stack
2.33% [kernel] [k] __fget_files
1.99% [kernel] [k] _raw_write_lock_irq
1.96% libwayland-client.so.0.22.0 [.] wl_display_dispatch_queue_pending@plt
1.92% ld-musl-x86_64.so.1 [.] __syscall_cp_asm
1.83% [kernel] [k] do_syscall_64
1.67% [kernel] [k] unix_poll
1.61% [kernel] [k] entry_SYSCALL_64_after_hwframe
1.55% libwayland-client.so.0.22.0 [.] pthread_mutex_unlock@plt
1.51% libwayland-client.so.0.22.0 [.] wl_display_dispatch_queue_pending
1.47% ld-musl-x86_64.so.1 [.] __cp_end
1.35% [kernel] [k] syscall_exit_to_user_mode
1.23% [kernel] [k] __se_sys_epoll_pwait
1.13% [kernel] [k] _raw_write_unlock_irq
1.12% libwayland-server.so.0.22.0 [.] wl_event_loop_dispatch_idle
1.10% [kernel] [k] mutex_lock
1.09% libwayland-server.so.0.22.0 [.] wl_list_empty
1.07% ld-musl-x86_64.so.1 [.] epoll_pwait
0.99% [kernel] [k] ep_done_scan
0.99% [kernel] [k] sock_poll
0.97% libwayland-client.so.0.22.0 [.] wl_display_flush
0.93% [kernel] [k] syscall_return_via_sysret
0.93% libwayland-server.so.0.22.0 [.] wl_display_flush_clients
0.86% ld-musl-x86_64.so.1 [.] __syscall_ret
0.86% libwayland-client.so.0.22.0 [.] pthread_mutex_lock@plt
0.82% libwayland-client.so.0.22.0 [.] wl_list_empty@plt
0.64% libwayland-server.so.0.22.0 [.] wl_event_loop_dispatch_idle@plt
0.60% [kernel] [k] __fdget
0.59% libwayland-server.so.0.22.0 [.] wl_display_run
0.59% cagebreak [.] ipc_client_handle_writable
0.56% libwayland-server.so.0.22.0 [.] wl_list_init
0.54% [kernel] [k] exit_to_user_mode_prepare
0.52% libwlroots.so.11 [.] 0x0000000000053d4a
0.41% libwayland-server.so.0.22.0 [.] 0x0000000000004822
0.40% [kernel] [k] mutex_unlock
0.31% [kernel] [k] set_user_sigmask
0.31% libwlroots.so.11 [.] 0x0000000000053d43
i'd guess something calls ipc_client_handle_writable repeatedly until an event is triggered for some reason
i can't get any readable callgraph, but more or less it starts in wlroots
Hi everyone,
Thanks a lot for the help with debugging and additional system
information. We are now able to reproduce this and pushed a tentative
fix for it yesterday (see development
branch). Can you confirm that this fixes the issue?
Cheers,
project-repo
fixes it for me in my testing
I'm not as fast as psykose but I'll test asap. Thanks!
Fixes it for me too. Thanks again!
Fixed in 2.2.0
While testing scripts with the IPC socket I noticed that one core on my 4 core desktop goes to 100% and stays there until the first event occurs and is read by the script, at which point the CPU usage drops back down and the script and event processing seems to progress as expected from then on.
This also occurs when just connecting directly with nc (I tested with both busybox nc and openbsd-netcat), single core CPU goes to 100% until the first event occurs.
The process at 100% is
cagebreak -e
so it seems like a cagebreak issue at first glance but maybe it's something on the client side?I'm a novice at troubleshooting this type of thing, but strace is logging a lot of
epoll_pwait
calls during high CPU and then those taper off dramatically after the first event and CPU usage goes down.