swaywm / sway

i3-compatible Wayland compositor
https://swaywm.org
MIT License
14.39k stars 1.1k forks source link

missed key events under high cpu load #5423

Open simon-auch opened 4 years ago

simon-auch commented 4 years ago

When under high cpu load, the focused window sometimes does not get key events. The problem is kind of hard to reproduce reliably but it occurs to me mostly while playing games but also in alacritty while some cpu intensive stuff is going on in the background. Most of the time I observe it as: and , but the program seems to just get the part and starts to repeat the pressed key. Now I'm not sure if this is an sway bug at all or just linux missing out usb stuff due to high cpu usage, but I couldn't find anything indicating this happens in a quick google search.

coderkalyan commented 4 years ago

I believe I have experienced this too, but as stated, it is hard to reproduce reliably.

Xyene commented 4 years ago

I don't think sway/wlroots can drop events like that. From the sound of it the key press is being handled, but a release is never seen so sway starts repeating.

In general though, you're right: it is possible for events to be missed due to high CPU usage, but it'd be userspace (not the kernel) missing them, and they're not supposed to be dropped silently. The kernel generates a SYN_DROPPED event when it has to drop an event because userspace is too slow.

Quoting https://who-t.blogspot.com/2013/09/libevdev-handling-input-events.html:

A EV_SYN/SYN_DROPPED event is relatively recent (kernel 2.6.39). If a device sends events faster than userspace can read it, eventually the kernel buffers are full and the kernel drops events. When it does so, it sends a EV_SYN/SYN_DROPPED event to notify userspace. The userspace process then needs to stop what it's doing, re-sync the device (i.e. query all axis, key, LED, etc. values), update the internal state accordingly and then it can start reading events again.

Incorrect SYN_DROPPED handling could be the cause here, but it isn't sway or wlroots that handles this -- it's libinput, and libinput, to the best of my understanding, is supposed to replay events (or at least, not leave stuck keys?)

One possible way to confirm this (despite it being hard to reproduce) would be to run something like:

$ sudo libinput debug-events --show-keycodes /dev/input/event<your keyboard device> | tee ~/libinput-log

...when you think you're going to be doing something that has a chance of reproducing the issue. Note, of course, that this is a keylogger (but that probably doesn't matter if you're playing games?)

Then, if we were to see that the logs contain consecutive pressed events for the same key, with no release in-between, we'd know that the issue would be in libinput. Otherwise, it's in sway/wlroots.

simon-auch commented 4 years ago

I will give it a shot today or tomorrow evening.

Edit: Do you know if the syn_dropped events by any chance show up in dmesg?

Xyene commented 4 years ago

I don't think so, but you should be able to see them through something like sudo evemu-record /dev/input/event<your keyboard device> | grep SYN_DROPPED -C 5.

simon-auch commented 4 years ago

Ok, some results: the keylogger @Xyene provided did not show up anything going wrong.

Atleast for one particular game (eco) that had a very erratic behavior concerning my keyboard (basically missing out on keyevents all the time) I found the problem in the output of dmesg. While in game the dmesg output got spammed with the message usb 3-1.2: reset full-speed USB device number 4 using xhci_hcd (which happens to be my keyboard plugged into the usb hub from my monitor). After plugging my keyboard into a usb socket on my pc directly the game no more "misses" key events.

I will still keep the keylogger running when playing some other programs that also seem to sometimes miss key events and see if its the same problem for other programs.

What I would be interested in, does anyone have an idea why the usb device is reset all the time or how I could pin this down?

FreeFull commented 4 years ago

It's possible the kernel is trying to put the USB port into an idle, low-power state, and the hardware doesn't like it, resulting in a disconnect and reconnect. If that's the case, it's fixable with an udev rule. The other possibility could be some sort of physical break in a cable/circuit somewhere, or some other miscellaneous USB issue.

simon-auch commented 4 years ago

But why would that happen when a specific program is running?

Xyene commented 4 years ago

Glad you figured out a workaround!

If you don't mind though, it would be useful to dig a bit deeper here, since regardless of flaky hardware we really shouldn't be leaving keys stuck.

If you modify the logging command to:

$ sudo libinput debug-events --show-keycodes |& grep -v 'POINTER'  | tee ~/libinput-log

what I'd be interested in seeing is if you see DEVICE_REMOVED and DEVICE_ADDED lines around the time you see the reset messages in dmesg. Also, what does evemu-record output look like around this time? Please upload some logs (dmesg, libinput, evemu, sway -d) to https://gist.github.com/ or similar if you get the time.

I've tried reproducing this issue locally by:

but all of these result in the device being removed / re-added gracefully, and sway notifies clients that keys have been released when a keyboard is removed.

simon-auch commented 4 years ago

If you don't mind though, it would be useful to dig a bit deeper here, since regardless of flaky hardware we really shouldn't be leaving keys stuck.

No problem :)

All the logs I gathered (dmesg, sway, evemu, libinput): https://gist.github.com/raidwas/c546a0b6fd9bdd643b7424070ba5d4af

All logs are from the same session, each line appended with the timestamp when it got into the pipe (it seems libinput buffers a few lines when outputting into a pipe?).

The interesting time stamps can be easily found by looking at the last few lines in the dmesg output (for example Tue 16 Jun 2020 11:54:32). What I did to produce these logs:

  1. start a new sway session with -d
  2. open 3 root terminals in the sway session, start one of the logging scripts in each
  3. switch workspace and start steam and the game
  4. press some buttons ingame
  5. exit game
  6. stop logging scripts
  7. stop sway session

Also I tried reproducing the problem in xfce4, with success, so this might not be a problem of sway at all..

Tomorrow I will also add a log from the kernel debug fs usbmon with correct timestamps including the other logs, but as a small teaser some outstanding lines from the output: cat /sys/kernel/debug/usb/usbmon/3u:

ffff8e539a66d180 3952174648 S Co:3:011:0 s 23 03 0016 0203 0000 0
ffff8e539a66d180 3952175322 C Co:3:011:0 0 0
ffff8e527b79e0c0 3952416358 C Ii:3:015:2 -71:8 0
ffff8e527b79e0c0 3952471307 S Ii:3:015:2 -115:8 4 <
ffff8e527b79ed80 3952831335 C Ii:3:015:1 0:1 8 = 00001a00 00000000
ffff8e527b79ed80 3952831349 S Ii:3:015:1 -115:1 8 <
ffff8e4f3ad823c0 3952868041 S Co:3:011:0 s 23 03 0016 0303 0000 0
ffff8e4f3ad823c0 3952868615 C Co:3:011:0 0 0
ffff8e527b79ed80 3952915539 C Ii:3:015:1 -71:1 0
ffff8e527b79ed80 3952934640 S Ii:3:015:1 -115:1 8 <
ffff8e527b79e0c0 3953104496 C Ii:3:015:2 -71:8 0
ffff8e527b79ed80 3953105570 C Ii:3:015:1 -2:1 0
ffff8e4f3ce46480 3953105579 S Co:3:011:0 s 23 03 0004 0001 0000 0
ffff8e4f3ce46480 3953105608 C Co:3:011:0 0 0
ffff8e4f3ce46900 3953124641 S Ci:3:011:0 s a3 00 0000 0001 0004 4 <
ffff8e4f3ce46900 3953124833 C Ci:3:011:0 0 4 = 03011000
ffff8e4f3ce46900 3953124838 S Co:3:011:0 s 23 01 0014 0001 0000 0
ffff8e4f3ce46900 3953124895 C Co:3:011:0 0 0
ffff8e4f3ce46900 3953181506 S Ci:3:000:0 s 80 06 0100 0000 0040 64 <
ffff8e4f3ce46900 3953181905 C Ci:3:000:0 0 8 = 12011001 00000008
ffff8e4f3ce46900 3953181908 S Co:3:011:0 s 23 03 0004 0001 0000 0
ffff8e4f3ce46900 3953181936 C Co:3:011:0 0 0
ffff8e4f3ce46900 3953197982 S Ci:3:011:0 s a3 00 0000 0001 0004 4 <
ffff8e4f3ce46900 3953198118 C Ci:3:011:0 0 4 = 03011000
ffff8e4f3ce46900 3953198123 S Co:3:011:0 s 23 01 0014 0001 0000 0
ffff8e4f3ce46900 3953198280 C Co:3:011:0 0 0
ffff8e4f3ce46900 3953301310 S Ci:3:015:0 s 80 06 0100 0000 0012 18 <
ffff8e4f3ce46900 3953327902 C Ci:3:015:0 0 18 = 12011001 00000008 340a1202 00010102 0001
ffff8e4f3ce46900 3953327910 S Ci:3:015:0 s 80 06 0200 0000 003b 59 <
ffff8e4f3ce46900 3953328214 C Ci:3:015:0 0 59 = 09023b00 020100a0 fa090400 00010301 01030921 10010001 223f0007 05810308
ffff8e4f3ce46900 3953328933 S Co:3:015:0 s 00 09 0001 0000 0000 0
ffff8e4f3ce46900 3953328994 C Co:3:015:0 0 0
ffff8e4f3ce46480 3953329004 S Ci:3:015:0 s 81 06 2200 0001 0083 131 <
ffff8e4f3ce46480 3953329873 C Ci:3:015:0 0 131 = 05010980 a1018502 09810982 09831500 25017501 95038106 75059501 81010600
ffff8e4f3ce46900 3953329877 S Co:3:015:0 s 21 0a 0000 0001 0000 0
ffff8e4f3ce46900 3953329953 C Co:3:015:0 0 0
ffff8e527b79e0c0 3953329957 S Ii:3:015:2 -115:8 4 <
ffff8e4f3ce46900 3953329959 S Ci:3:015:0 s 81 06 2200 0000 003f 63 <
ffff8e4f3ce46900 3953330378 C Ci:3:015:0 0 63 = 05010906 a1010507 19e029e7 15002501 75019508 81027508 95018103 05081901
ffff8e4f3ce46900 3953330382 S Co:3:015:0 s 21 0a 0000 0000 0000 0
ffff8e4f3ce46900 3953330591 C Co:3:015:0 0 0
ffff8e527b79ed80 3953330595 S Ii:3:015:1 -115:1 8 <
ffff8e527b79ed80 3953549424 C Ii:3:015:1 0:1 8 = 04000000 00000000
ffff8e527b79ed80 3953549444 S Ii:3:015:1 -115:1 8 <
ffff8e4ea5776780 3953564683 S Co:3:011:0 s 23 03 0016 0203 0000 0
ffff8e4ea5776780 3953565291 C Co:3:011:0 0 0
ffff8e527b79ed80 3953867681 C Ii:3:015:1 -71:1 0
ffff8e527b79ed80 3953881309 S Ii:3:015:1 -115:1 8 <
ffff8e527b79ed80 3953896537 C Ii:3:015:1 -71:1 0
ffff8e527b79ed80 3953924642 S Ii:3:015:1 -115:1 8 <
ffff8e527b79ed80 3954126519 C Ii:3:015:1 0:1 8 = 04000000 00000000
ffff8e527b79ed80 3954126527 S Ii:3:015:1 -115:1 8 <
ffff8e4f3aeedd80 3954254643 S Co:3:011:0 s 23 03 0016 0303 0000 0

Whats interesting is, those long lines don't seem to appear usually, but only when in the game and having a "stuck" key. I might try and find out what they mean tomorrow (maybe its just the reset of the usb device, but not the cause?)

Xyene commented 4 years ago

Great!

All logs are from the same session, each line appended with the timestamp when it got into the pipe (it seems libinput buffers a few lines when outputting into a pipe?).

Yeah, this is standard behavior of many applications that you can disable by unbuffer <app> |& <other app>, where unbuffer comes with the expect-dev package. This fakes a "terminal" and can also be useful for collecting colored output through pipes (if you ever find yourself needing to do that).

The interesting time stamps can be easily found by looking at the last few lines in the dmesg output (for example Tue 16 Jun 2020 11:54:32).

Hmm, which keys got stuck during this trace? Overlapping the timestamp you mention shows:

16 Jun 2020 11:54:31 PM CEST E: 121.842106 0000 0000 0001   # ------------ SYN_REPORT (1) ---------- +37ms
Tue 16 Jun 2020 11:54:31 PM CEST E: 121.842106 0004 0004 458778 # EV_MSC / MSC_SCAN             458778
Tue 16 Jun 2020 11:54:31 PM CEST E: 121.842106 0001 0011 0000   # EV_KEY / KEY_W                0
Tue 16 Jun 2020 11:54:31 PM CEST E: 121.842106 0000 0000 0000   # ------------ SYN_REPORT (0) ---------- +0ms
Tue 16 Jun 2020 11:54:33 PM CEST E: 124.420536 0004 0004 458978 # EV_MSC / MSC_SCAN             458978
Tue 16 Jun 2020 11:54:33 PM CEST E: 124.420536 0001 0038 0001   # EV_KEY / KEY_LEFTALT          1
Tue 16 Jun 2020 11:54:33 PM CEST E: 124.420536 0000 0000 0000   # ------------ SYN_REPORT (0) ---------- +2578ms
Tue 16 Jun 2020 11:54:33 PM CEST E: 124.688774 0001 0038 0002   # EV_KEY / KEY_LEFTALT          2
Tue 16 Jun 2020 11:54:33 PM CEST E: 124.688774 0000 0000 0001   # ------------ SYN_REPORT (1) ---------- +268ms
Tue 16 Jun 2020 11:54:33 PM CEST E: 124.725441 0001 0038 0002   # EV_KEY / KEY_LEFTALT          2
Tue 16 Jun 2020 11:54:33 PM CEST E: 124.725441 0000 0000 0001   # ------------ SYN_REPORT (1) ---------- +37ms
...
Tue 16 Jun 2020 11:54:34 PM CEST E: 125.212111 0001 0038 0002   # EV_KEY / KEY_LEFTALT          2
Tue 16 Jun 2020 11:54:34 PM CEST E: 125.212111 0000 0000 0001   # ------------ SYN_REPORT (1) ---------- +37ms
Tue 16 Jun 2020 11:54:34 PM CEST E: 125.252110 0001 0038 0002   # EV_KEY / KEY_LEFTALT          2
Tue 16 Jun 2020 11:54:34 PM CEST E: 125.252110 0000 0000 0001   # ------------ SYN_REPORT (1) ---------- +40ms
Tue 16 Jun 2020 11:54:34 PM CEST E: 125.252110 0004 0004 458978 # EV_MSC / MSC_SCAN             458978
Tue 16 Jun 2020 11:54:34 PM CEST E: 125.252110 0001 0038 0000   # EV_KEY / KEY_LEFTALT          0
Tue 16 Jun 2020 11:54:34 PM CEST E: 125.252110 0000 0000 0000   # ------------ SYN_REPORT (0) ---------- +0ms

Was left Alt the key that got stuck?

Also, for clarity purposes: do these keys "unstick" themselves (e.g., after the USB bus finishes resetting)? How do you stop them from repeating?

simon-auch commented 4 years ago

It was the w and s keys (multiple times). I will try and create some new logs with better timestamps and knowing which key press exactly goes wrong.

The keys only "unstick" after pressing another key.

New day, new session, new logs :) https://gist.github.com/raidwas/2d5a29f6c01dbbd79212132cba40386f

(For some reason the sway log was empty, I guess I misused the unbuffered there, sorry) How those logs were produced:

  1. Start new sway session
  2. start 4 root terminals and start a logger in each one (usb, evemu, libinput, dmesg)
  3. switch workspace (6)
  4. start steam and game
  5. press "w" a few times until "stuck" then "unstuck" by pressing "s"
  6. exit game
  7. switch workspaces
  8. stop loggers
  9. exit sway session

The relevant timestamps from the logs should therefore be around the last time I "released" "w" and pressed "s":

2521.90 E: 98.663831 0001 0011 0000 # EV_KEY / KEY_W                0
2521.90 E: 98.663831 0004 0004 458774   # EV_MSC / MSC_SCAN             458774
2521.91 E: 98.663831 0001 001f 0001 # EV_KEY / KEY_S                1

(For clarity: no I did not release w and press s within a fraction of a second, I actually released w before that, but the "key event" is only generated upon pressing the "s" key) But these timestamps don't exactly match up with the (only) reset message in dmesg:

2526.01 [ 2526.042850] usb 3-1.1: reset full-speed USB device number 3 using xhci_hcd

The timestamp of the dmesg line howerver matcher the "long" lines in the usb-log (if anyone can read them).

For completeness here are the commandlines used to get the logs:

unbuffer dmesg -w |& while read line; do echo `awk '{print $1}' /proc/uptime` "$line"; done | tee dmesg-log
unbuffer evemu-record /dev/input/event3 |& while read line; do echo `awk '{print $1}' /proc/uptime` "$line"; done | tee evemu-log
unbuffer libinput debug-events --show-keycodes |& unbuffer -p grep -v 'POINTER' |& while read line; do echo `awk '{print $1}' /proc/uptime` "$line"; done | tee libinput-log
unbuffer cat /sys/kernel/debug/usb/usbmon/3u |& while read line; do echo `awk '{print $1}' /proc/uptime` "$line"; done | tee usb-log

(I tested all and none of them appeared to buffer anything, except the dmesg one which I don't know how I could test (but I think it doesn't buffer)).

v-gu commented 3 years ago

I experience this is for a long time, regardless of log details, I tried with Xorg, same Unity3D game, no key stuck issue, in sway, it has.

v-gu commented 3 years ago

https://gitlab.gnome.org/GNOME/mutter/-/issues/532 looked the same? except it's mutter https://bugzilla.redhat.com/show_bug.cgi?id=1579859 fedora

v-gu commented 3 years ago

My test rest:

In following session, key stuck happened for 2 times(I'm just walking left(A) and right(D) in-game), but nothing suspicious printed, only before game launch, libinput error: event24 - HHKB-BT Keyboard: client bug: event processing lagging behind by 43ms, your system is too slow this line was printed.

sudo libinput debug-events --show-keycodes /dev/input/event24
-event24  DEVICE_ADDED     HHKB-BT Keyboard                  seat0 default group1  cap:k
 event24  KEYBOARD_KEY     +1.845s  KEY_D (32) pressed
 event24  KEYBOARD_KEY     +2.821s  KEY_D (32) released
 event24  KEYBOARD_KEY     +4.305s  KEY_A (30) pressed
libinput error: event24 - HHKB-BT Keyboard: client bug: event processing lagging behind by 43ms, your system is too slow
 event24  KEYBOARD_KEY     +6.331s  KEY_A (30) released
 event24  KEYBOARD_KEY     +8.745s  KEY_D (32) pressed
 event24  KEYBOARD_KEY     +11.311s KEY_D (32) released
 event24  KEYBOARD_KEY     +13.275s KEY_A (30) pressed
 event24  KEYBOARD_KEY     +15.331s KEY_A (30) released
 event24  KEYBOARD_KEY     +16.710s KEY_D (32) pressed
 event24  KEYBOARD_KEY     +18.901s KEY_D (32) released
 event24  KEYBOARD_KEY     +19.860s KEY_A (30) pressed
 event24  KEYBOARD_KEY     +20.761s KEY_A (30) released
 event24  KEYBOARD_KEY     +71.151s KEY_D (32) pressed
 event24  KEYBOARD_KEY     +71.222s KEY_D (32) released
 event24  KEYBOARD_KEY     +73.186s KEY_D (32) pressed
 event24  KEYBOARD_KEY     +73.306s KEY_D (32) released
 event24  KEYBOARD_KEY     +73.967s KEY_D (32) pressed
 event24  KEYBOARD_KEY     +74.072s KEY_D (32) released
 event24  KEYBOARD_KEY     +75.992s KEY_D (32) pressed
 event24  KEYBOARD_KEY     +76.097s KEY_D (32) released
 event24  KEYBOARD_KEY     +76.892s KEY_A (30) pressed
 event24  KEYBOARD_KEY     +80.611s KEY_A (30) released
 event24  KEYBOARD_KEY     +81.137s KEY_D (32) pressed
 event24  KEYBOARD_KEY     +82.141s KEY_D (32) released
E: 82.761332 0001 001e 0002 # EV_KEY / KEY_A                2
E: 82.761332 0000 0000 0001 # ------------ SYN_REPORT (1) ---------- +50ms
E: 82.811339 0001 001e 0002 # EV_KEY / KEY_A                2
E: 82.811339 0000 0000 0001 # ------------ SYN_REPORT (1) ---------- +50ms
E: 82.861338 0001 001e 0002 # EV_KEY / KEY_A                2
E: 82.861338 0000 0000 0001 # ------------ SYN_REPORT (1) ---------- +50ms
E: 82.921396 0001 001e 0002 # EV_KEY / KEY_A                2
E: 82.921396 0000 0000 0001 # ------------ SYN_REPORT (1) ---------- +60ms
E: 82.981331 0001 001e 0002 # EV_KEY / KEY_A                2
E: 82.981331 0000 0000 0001 # ------------ SYN_REPORT (1) ---------- +60ms
E: 82.981331 0004 0004 458756   # EV_MSC / MSC_SCAN             458756
E: 82.981331 0001 001e 0000 # EV_KEY / KEY_A                0

Here when I lift A, there was a clear key RELEASE event, but in-game, character still walking left.

Although in the output HHKB-BT is a blue-tooth device, but all keyboards have been affected by this, include AT Translated Set 2 keyboard, if I exit sway and launch Xorg, everything is smooth.

cglogic commented 3 years ago

The same issue. Xorg with libinput works fine.

ivan-volnov commented 2 years ago

Have this issue too.

Emantor commented 2 years ago

https://github.com/swaywm/sway/pull/6994 should fix this if your sway binary is started with CAP_SYS_NICE.

JeffHenson commented 2 years ago

I was having what sounds like the same issue while playing games (with some screen lag when it happens) but I think I've narrowed down what my problem is.

I'm pretty sure my issue was being caused by this setting I have in swayidle:

timeout 1 '' resume 'swaymsg "output * dpms on"'

When I stop swayidle, the problem no longer occurs.

If I do a check to see if the dpms on call is needed, the problem also does not occur:

timeout 1 '' resume 'swaymsg -r -t get_outputs | jq -e ".[] | select(.dpms==false)" >/dev/null && swaymsg "output * dpms on"'

I would have assumed (knowing nothing about sway's code) that the output * dpms on command would be a NOOP if there are no screens that need to be turned back on but it seems whatever it's doing is causing some lag.

FYI, the problem still occurs for me (and at a greater frequency) when using sway-git with CAP_SYS_NICE set on the binary with my original swayidle config.

viluon commented 1 year ago

I'm also experiencing this issue, on Sway 1.8.

ghost commented 9 months ago

Having a similar issue with mouse events being missed, (keyboard events are unaffected), during specific high loads.

I can reproduce reliably using by generating a procedural map in ARK: Survival Evolved. This causes me to miss key events in Steam, flatpak librewolf (with forced x11), and xterm. All native wayland applications are unaffected. For me at least, issue likely involves xwayland.

libinput debug-events --show-keycodes also shows the mouse buttons always being captured in the keylogger, so I don't think it's an issue with libinput.

Artix Linux, Kernel 6.6.2-artix1-1 $ pacman -Q | grep -i sway sway 1:1.8.1-3 swaybg 1.2.0-1

YellowOnion commented 1 month ago

I just started noticing this bug, Keyboard input is kind of important, why is this not a high priority bug?