swaywm / swayidle

Idle management daemon for Wayland
MIT License
550 stars 50 forks source link

suspend / dpms via swayidle causing high CPU usage / instability in firefox? #162

Closed t3e closed 7 months ago

t3e commented 7 months ago

I'm using

The issue I'm facing is that when I call

systemctl suspend

as a timeout command like in this example:

exec swayidle -w \
          timeout 10 'swaylock -f -c 000000' \
          timeout 20 'swaymsg "output * dpms off"' resume 'swaymsg "output * dpms on"' \
          timeout 30 'systemctl suspend' resume 'swaymsg "output * dpms on"' \
          before-sleep 'swaylock -f -c 000000'

... then I noticed a strange phenomenon with firefox after the wakeup. It predictably happens everytime I execute this with an opened browser window, after resuming I see one CPU core beeing fully utilized by a firefox process, even when there was nearly 0 activity before and it stays this way. Sometimes I even had crashes with firefox afterwards.

It does not happen when i call systemctl suspend (without using swayidle), so I suspect there must be some strange interference.

I also noticed one time than even when I close and reopen firefox, the cpu fan doesn't slow down, even when there's no more activity, which is really beyond strange and hard for me to understand... I have come to the conclusion that it must be related to swayidle or some underlying system that it uses (and probably also firefox itself but since it so far only occurs with swayidle I think it might be the primary culprit).

This happens also with a "blank" profile in firefox without any extensions. When opening about:performance it is the main thread in firefox that is using 100% cpu time.

As a side note: What I also remember is sway segfaulting at one point (with complete crash back to tty). Unfortunately I can't remember how that happend and I'm unable to reproduce it.

kennylevinsen commented 7 months ago

The issue is very unlikely to be swayidle itself - it more or less just runs specified commands in order.

One notable difference between just running systemctl suspend and your swayidle setup is that your swayidle setup locks the screen, then disables outputs, then suspends, then enables outputs. If you just ran those things through a script with equivalent sleeps in between, you'll likely get the same result.

t3e commented 7 months ago

It turns out you're absolutely right, the following 2 lines are enough to reproduce the error

systemctl suspend
swaymsg "output * dpms on"

So i guess this could be an issue in sway itself? Or firefox?

After reading a bit through the error reports on sway I'm starting to suspect it's a specific driver issue with my hardware.

https://github.com/swaywm/sway/issues/7910

I have AMD integrated graphics (Ryzen 5 3400g) and for sway there are some reports about instabilities with similar platforms. I didn't have any instabilities or sililar issues before switching to sway though.

kennylevinsen commented 7 months ago

Unless Firefox burns CPU because sway spams it with traffic, the issue would be within Firefox. However, "within Firefox" could be caused by mesa (your user-space GPU driver).

htop can show which thread is burning CPU (H to toggle threads, t to toggle tree mode for organization, F2 -> Display options -> Show custom thread names to make the thread purpose clear), might give a clue. WAYLAND_DEBUG=1 can be used to see what happens on the wire, but this will be noisy and requires a strict reproduction with no other interaction to be useful.

https://github.com/swaywm/sway/issues/7910

That issue was closed after discovering it was caused by faulty RAM.

t3e commented 7 months ago

I haven't tried this yet but I've made an interesting discovery:

When I execute

systemctl suspend
swaymsg "output [monitor1] dpms on"
swaymsg "output [monitor2] dpms on"
swaymsg "output [monitor3] dpms on"

then the problem does not occur!

So the issue is probably somehow related to the wildcard for the outputs...

kennylevinsen commented 7 months ago

That may simply be timing. I still mainly suspect Firefox, as that is where your symptoms are.

t3e commented 7 months ago

You're apparently right about the timing because it does occur when I do this

systemctl suspend
swaymsg "output [monitor1] dpms on, output [monitor2] dpms on, output [monitor3] dpms on"

And I also found out that has nothing to do with suspend at all (!), just

swaymsg "output [monitor1] dpms on, output [monitor2] dpms on"

(dpms signal to more than one output) is enough to reproduce the error (even if the monitors are already on).

t3e commented 7 months ago

The firefox thread name in htop (with 100% CPU usage) is called "WaylandProxy"

kennylevinsen commented 7 months ago

In that case, it just sounds like a bug in Firefox' WaylandProxy component - file a bug with them and see.

Closing here as it is unrelated to swayidle.

t3e commented 7 months ago

Here I don't agree as I've now recorded the wayland messages as you suggested and the protocoll looks like this.

[1122210.849]  -> wl_keyboard#24.repeat_info(35, 190)
[1122210.856]  -> wl_keyboard#24.modifiers(472, 0, 0, 0, 0)
[1122210.865]  -> wl_keyboard#24.leave(473, wl_surface#3)
[1122210.874]  -> wl_keyboard#24.enter(474, wl_surface#3, array[0])
[1122210.881]  -> wl_keyboard#24.modifiers(475, 0, 0, 0, 0)
[1122210.887]  -> wl_data_device#15.selection(nil)
[1122210.894]  -> zwp_primary_selection_device_v1#16.selection(nil)
[1122218.421]  -> wl_keyboard#9.keymap(1, fd 195, 66082)
[1122218.442]  -> wl_keyboard#9.repeat_info(35, 190)
[1122218.449]  -> wl_keyboard#47.keymap(1, fd 196, 66082)
[1122218.474]  -> wl_keyboard#9.keymap(1, fd 112, 66082)
[1122218.480]  -> wl_keyboard#9.repeat_info(35, 190)
[1122218.486]  -> wl_keyboard#24.keymap(1, fd 113, 66082)
[1122218.491]  -> wl_keyboard#24.repeat_info(35, 190)
[1122218.496]  -> wl_keyboard#24.keymap(1, fd 118, 66082)
[1122218.502]  -> wl_keyboard#24.repeat_info(35, 190)
[1122218.507]  -> wl_keyboard#24.keymap(1, fd 119, 66082)
[1122218.514]  -> wl_keyboard#24.repeat_info(35, 190)

------------------------------------------------------------------------------------------------------
Here i executed:
swaymsg "output [monitor1] dpms on, output [monitor2] dpms on"
------------------------------------------------------------------------------------------------------

Error: WaylandMessage::Write() too many files to send
 : 
[1122218.521]  -> wl_keyboard#24.modifiers(476, 0, 0, 0, 0)
Error: WaylandMessage::Write() too many files to send
 : 
[1122218.551]  -> wl_keyboard#24.leave(477, wl_surface#3)
Error: WaylandMessage::Write() too many files to send
 : 
[1122218.563]  -> wl_keyboard#24.enter(478, wl_surface#3, array[0])
Error: WaylandMessage::Write() too many files to send
 : 
[1122218.571]  -> wl_keyboard#24.modifiers(479, 0, 0, 0, 0)
Error: WaylandMessage::Write() too many files to send
 : 
[1122218.582]  -> wl_data_device#15.selection(nil)
Error: WaylandMessage::Write() too many files to send
 : 
[1122218.602]  -> zwp_primary_selection_device_v1#16.selection(nil)
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send
 : 
Error: WaylandMessage::Write() too many files to send

After the issue occures wayland seems to send an insane amount of messages, the filesize of the log grew very quickly. This may not be related to swayidle, but I don't think the problem here is firefox.

t3e commented 7 months ago

Ok maybe it is related to firefox, I'm not shure if I understand the output correctly (looks like firefox is producing the messages?)

https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/2051838 https://bugzilla.mozilla.org/show_bug.cgi?id=1882449

kennylevinsen commented 7 months ago

Here I don't agree

WaylandProxy is a small component Firefox uses to buffer messages. If it is written well and does not have any active bugs, then for sway to cause 100% CPU load in WaylandProxy, sway would also nneed to consume somewhere around 100% CPU load to send that much traffic to it. As such, I consider the 100% CPU load a bug in WaylandProxy until proven otherwise.

At the same time, you have error logs from said component, stating it has "too many files to send" in one direction or the other. As it received said files, the message on the wire was valid, and so it's very likely that WaylandProxy's wire protocol implementation has tripped over itself.

Please use the Firefox bug tracker to discuss Firefox issues.

t3e commented 7 months ago

Ok then, thanks for the explanation. According to the links the problem has already been fixed in version 125 of Firefox which is planned to be released next month.