yshui / picom

A lightweight compositor for X11 with animation support
https://picom.app/
Other
4.1k stars 585 forks source link

Picom gets stuck (infinite loop?) with monitors in sleep and gets killed on wakeup #1212

Closed awused closed 7 months ago

awused commented 7 months ago

Platform

Fedora 39, kernel 6.6 and 6.7, i3-gaps

GPU, drivers, and screen setup

Nvidia 4090, 545.29.06, four monitors, one 4k 120hz and three 1440p 120hz.

glxinfo ``` name of display: :0 display: :0 screen: 0 direct rendering: Yes Memory info (GL_NVX_gpu_memory_info): Dedicated video memory: 24564 MB Total available memory: 24564 MB Currently available dedicated video memory: 21855 MB OpenGL vendor string: NVIDIA Corporation OpenGL renderer string: NVIDIA GeForce RTX 4090/PCIe/SSE2 OpenGL core profile version string: 4.6.0 NVIDIA 545.29.06 OpenGL core profile shading language version string: 4.60 NVIDIA OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL version string: 4.6.0 NVIDIA 545.29.06 OpenGL shading language version string: 4.60 NVIDIA OpenGL context flags: (none) OpenGL profile mask: (none) OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 545.29.06 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20 ```

Environment

i3-gaps, i3lock, using dpms to put the monitors to sleep. The computer itself stays awake, I do not use sleep/hibernate.

picom version

vgit-2be58, to test the recent changes

Diagnostics ``` **Version:** vgit-2be58 ### Extensions: * Shape: Yes * RandR: Yes * Present: Present ### Misc: * Use Overlay: Yes * Config file used: /home/desuwa/.config/picom.conf ### Drivers (inaccurate): NVIDIA ### Backend: glx * Driver vendors: * GLX: NVIDIA Corporation * GL: NVIDIA Corporation * GL renderer: NVIDIA GeForce RTX 4090/PCIe/SSE2 ### Backend: egl * Driver vendors: * EGL: NVIDIA * GL: NVIDIA Corporation * GL renderer: NVIDIA GeForce RTX 4090/PCIe/SSE2 ```

Configuration:

Configuration file ``` backend = "glx"; vsync = true; shadow = true; #no-dock-shadow = true; #no-dnd-shadow = true; shadow-radius = 10; shadow-offset-x = -5; shadow-offset-y = 0; shadow-opacity = 0.8; shadow-red = 0.11; shadow-green = 0.12; shadow-blue = 0.13; shadow-exclude = [ "name = 'Notification'", "_GTK_FRAME_EXTENTS@:c", "class_g = 'i3-frame'", "_NET_WM_STATE@:32a *= '_NET_WM_STATE_HIDDEN'", "_NET_WM_STATE@:32a *= '_NET_WM_STATE_STICKY'", "!I3_FLOATING_WINDOW@:c" ]; shadow-ignore-shaped = true; #alpha-step = 0.06; blur-background = false; blur-background-fixed = true; blur-kern = "7x7box"; blur-background-exclude = [ "class_g = 'i3-frame'", "window_type = 'dock'", "window_type = 'desktop'", "_GTK_FRAME_EXTENTS@:c" ]; # Duplicating the _NET_WM_STATE entries because compton cannot deal with atom arrays :-/ opacity-rule = [ "0:_NET_WM_STATE@:32a *= '_NET_WM_STATE_HIDDEN'" ]; fading = false; fade-delta = 7; fade-in-step = 0.05; fade-out-step = 0.05; fade-exclude = []; mark-wmwin-focused = true; mark-ovredir-focused = true; use-ewmh-active-win = true; detect-rounded-corners = true; detect-client-opacity = true; # refresh-rate = 0; dbe = false; glx-no-stencil = true; glx-copy-from-front = false; use-damage = true; # sw-opti = false; unredir-if-possible = false; focus-exclude = []; detect-transient = true; detect-client-leader = true; invert-color-include = []; xrender-sync-fence = true; wintypes: { tooltip = { fade = true; shadow = false; opacity = 1.00; focus = true; }; }; ```

Steps of reproduction

  1. Use picom
  2. Lock the screen with i3lock and put the monitors to sleep with dpms
  3. After the monitors turn off, start typing in your password

This does not reproduce absolutely every time, but from my experience the longer the monitors have been off the better the chance picom will crash. Every time I've locked my screens overnight, picom has crashed in the morning. i3lock is probably not necessary, it's just the first window that takes inputs and has damage.

My lock script is here: https://github.com/awused/dotfiles/blob/master/gui/.config/i3/lock

Expected behavior

picom does not die as monitors return from sleep.

Current Behavior

I started picom with /usr/local/bin/picom --log-level TRACE; date so I'd know exactly when the process exited.

[ 03/02/2024 10:46:41.587 ev_handle TRACE ] event     Damage serial 0x0000da40 window 0x00400091 "polybar-primary_HDMI-0"
[ 03/02/2024 10:46:41.587 queue_redraw VERBOSE ] Queue redraw, render_queued: 1, backend_busy: 1
[ 03/02/2024 10:46:41.587 repair_win TRACE ] Mark window 0x00400091 (polybar-primary_HDMI-0) as having received damage
[ 03/02/2024 10:46:41.587 add_damage TRACE ] Adding damage:
[ 03/02/2024 10:46:41.587 dump_region TRACE ] nrects: 0
[ 03/02/2024 10:46:41.614 ev_handle TRACE ] event     Damage serial 0x0000da42 window 0x00400090 "polybar-primary_DP-4"
[ 03/02/2024 10:46:41.614 queue_redraw VERBOSE ] Queue redraw, render_queued: 1, backend_busy: 1
[ 03/02/2024 10:46:41.614 repair_win TRACE ] Mark window 0x00400090 (polybar-primary_DP-4) as having received damage
[ 03/02/2024 10:46:41.614 add_damage TRACE ] Adding damage:
[ 03/02/2024 10:46:41.614 dump_region TRACE ] nrects: 0
[ 03/02/2024 10:46:41.615 ev_handle TRACE ] event     Damage serial 0x0000da44 window 0x00400092 "polybar-primary_DP-0"
[ 03/02/2024 10:46:41.615 queue_redraw VERBOSE ] Queue redraw, render_queued: 1, backend_busy: 1
[ 03/02/2024 10:46:41.615 repair_win TRACE ] Mark window 0x00400092 (polybar-primary_DP-0) as having received damage
[ 03/02/2024 10:46:41.615 add_damage TRACE ] Adding damage:
[ 03/02/2024 10:46:41.615 dump_region TRACE ] nrects: 0

[ 03/02/2024 10:46:42.795 ev_handle TRACE ] event     Damage serial 0x0000da46 window 0x0b000007 "i3lock"
[ 03/02/2024 10:46:42.795 queue_redraw VERBOSE ] Queue redraw, render_queued: 1, backend_busy: 1
zsh: killed     /usr/local/bin/picom --log-level TRACE
Sat Mar  2 10:46:48 AM PST 2024

That first line with i3lock is, from my understanding, when I first hit a key to type in my password. The monitors were all still in sleep mode at this time. When I hit the first key, I noticed an unusually long pause before my monitors started waking up, which is usually much faster. Normally I wiggle the mouse and wait for the monitors to wake up before typing in my password, and the bug still happens, but this time I was deliberately trying to see if typing first would change the bug; it didn't.

My best guess is picom got stuck in some kind of infinite loop the first time i3lock tried to paint its animation, which blocked something (X server? Drivers? I don't know) for ~4 seconds, then it was killed.

Stack trace

I'd have to figure out which signal is being used to kill it and enable core dumps for those. Hopefully this rings a bell and I don't need to do that.

OpenGL trace

Other details

I only noticed this after switching to the new backend after the recent bugs around picom getting stuck were fixed, but that's also a new version compiled from git head. For now, I am going to switch back to the legacy backend but with the new git version and see if it reproduces. Given how long it can take to reliably reproduce I'll update this bug with my findings tomorrow.

Monsterovich commented 7 months ago

Try this.

diff --git a/src/utils.c b/src/utils.c
index 68ec697..7cde053 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -284,10 +284,10 @@ void rolling_quantile_pop_front(struct rolling_quantile *rq, int x) {
 void set_rr_scheduling(void) {
        int priority = sched_get_priority_min(SCHED_RR);

-       if (rtkit_make_realtime(0, priority)) {
-               log_info("Set realtime priority to %d with rtkit.", priority);
-               return;
-       }
+       // if (rtkit_make_realtime(0, priority)) {
+               // log_info("Set realtime priority to %d with rtkit.", priority);
+               // return;
+       // }

        // Fallback to use pthread_setschedparam
        struct sched_param param;
awused commented 7 months ago

The legacy backend did not get killed after running overnight, though it was only one sample. I'll try that patch with the new backend now.

awused commented 7 months ago

I have not tested that patch yet.

0x836 commented 3 months ago

Hello. I have exactly the same problem. I applied this patch and have been testing it for more than 15 days, and during this time, the bug hasn't reproduced even once.