Long-running picom causes system slowdown (high CPU usage of Xorg process)

RangHo commented 2 weeks ago

Platform

Void Linux (glibc, x86_64)

GPU, drivers, and screen setup

Framework Laptop 13 AMD, Ryzen 7840U, Radeon 780M.

$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: AMD (0x1002)
    Device: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 17.0.6, DRM 3.54, 6.6.35_1) (0x15bf)
    Version: 24.1.2
    Accelerated: yes
    Video memory: 512MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
    VBO free memory - total: 81 MB, largest block: 81 MB
    VBO free aux. memory - total: 14873 MB, largest block: 14873 MB
    Texture free memory - total: 81 MB, largest block: 81 MB
    Texture free aux. memory - total: 14873 MB, largest block: 14873 MB
    Renderbuffer free memory - total: 81 MB, largest block: 81 MB
    Renderbuffer free aux. memory - total: 14873 MB, largest block: 14873 MB
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 512 MB
    Total available memory: 16152 MB
    Currently available dedicated video memory: 81 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 17.0.6, DRM 3.54, 6.6.35_1)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 24.1.2
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6 (Compatibility Profile) Mesa 24.1.2
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 24.1.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

Environment

XMonad 0.18.0

picom version

$ picom --version
v11

Diagnostics

**Version:** v11 ### Extensions: * Shape: Yes * RandR: Yes * Present: Present ### Misc: * Use Overlay: No (Another compositor is already running) * Config file used: /home/rangho/.config/picom/picom.conf ### Drivers (inaccurate): modesetting ### Backend: glx * Driver vendors: * GLX: Mesa Project and SGI * GL: AMD * GL renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 17.0.6, DRM 3.54, 6.6.35_1) * Accelerated: 1 ### Backend: egl * Driver vendors: * EGL: Mesa Project * EGL driver: radeonsi * GL: AMD * GL renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 17.0.6, DRM 3.54, 6.6.35_1)

Configuration:

For GitHub link: HERE

Configuration file

``` ################################# # Shadows # ################################# shadow = true; shadow-radius = 7; shadow-opacity = .75 shadow-offset-x = -7; shadow-offset-y = -7; shadow-color = "#000000" # Specify a list of conditions of windows that should have no shadow. shadow-exclude = [ "name = 'Notification'", "class_g = 'Conky'", "class_g ?= 'Notify-osd'", "class_g = 'Cairo-clock'", ]; clip-shadow-above = [ "window_type = 'dock'", ]; crop-shadow-to-monitor = false; ################################# # Fading # ################################# fading = true; fade-in-step = 0.03; fade-out-step = 0.03; fade-delta = 5; fade-exclude = []; no-fading-openclose = false; no-fading-destroyed-argb = false; ################################# # Transparency / Opacity # ################################# inactive-opacity = 0.75; inactive-opacity-override = true; active-opacity = 1.0; inactive-dim = 0.0; frame-opacity = 0.7; focus-exclude = [ "class_g = 'Cairo-clock'" ]; opacity-rule = [] ################################# # Corners # ################################# corner-radius = 0; rounded-corners-exclude = [ "window_type = 'dock'", "window_type = 'desktop'", "window_type = 'popup_menu'", ]; ################################# # Background-Blurring # ################################# blur-method = "dual_kawase"; blur-strength = 4; blur-background-exclude = [ "window_type = 'dock'", "window_type = 'desktop'", "window_type = 'popup_menu'", "window_type = 'utility'", ]; ################################# # General Settings # ################################# dbus = false backend = "glx"; dithered-present = false; vsync = true; mark-wmwin-focused = true; mark-ovredir-focused = true; detect-rounded-corners = true; detect-client-opacity = true; use-ewmh-active-win = true; unredir-if-possible = false; # Delay before unredirecting the window, in milliseconds. Defaults to 0. # unredir-if-possible-delay = 0 # Conditions of windows that shouldn't be considered full-screen for unredirecting screen. # unredir-if-possible-exclude = [] detect-transient = true; # Use 'WM_CLIENT_LEADER' to group windows, and consider windows in the same # group focused at the same time. This usually means windows from the same application # will be considered focused or unfocused at the same time. # 'WM_TRANSIENT_FOR' has higher priority if detect-transient is enabled, too. # # detect-client-leader = false # Resize damaged region by a specific number of pixels. # A positive value enlarges it while a negative one shrinks it. # If the value is positive, those additional pixels will not be actually painted # to screen, only used in blur calculation, and such. (Due to technical limitations, # with use-damage, those pixels will still be incorrectly painted to screen.) # Primarily used to fix the line corruption issues of blur, # in which case you should use the blur radius value here # (e.g. with a 3x3 kernel, you should use `--resize-damage 1`, # with a 5x5 one you use `--resize-damage 2`, and so on). # May or may not work with *--glx-no-stencil*. Shrinking doesn't function correctly. # # resize-damage = 1 # Specify a list of conditions of windows that should be painted with inverted color. # Resource-hogging, and is not well tested. # # invert-color-include = [] glx-no-stencil = true # GLX backend: Avoid rebinding pixmap on window damage. # Probably could improve performance on rapid window content changes, # but is known to break things on some drivers (LLVMpipe, xf86-video-intel, etc.). # Recommended if it works. # # glx-no-rebind-pixmap = false use-damage = true; # Use X Sync fence to sync clients' draw calls, to make sure all draw # calls are finished before picom starts drawing. Needed on nvidia-drivers # with GLX backend for some users. # # xrender-sync-fence = false # GLX backend: Use specified GLSL fragment shader for rendering window # contents. Read the man page for a detailed explanation of the interface. # # window-shader-fg = "default" # Use rules to set per-window shaders. Syntax is SHADER_PATH:PATTERN, similar # to opacity-rule. SHADER_PATH can be "default". This overrides window-shader-fg. # # window-shader-fg-rule = [ # "my_shader.frag:window_type != 'dock'" # ] # Force all windows to be painted with blending. Useful if you # have a glx-fshader-win that could turn opaque pixels transparent. # # force-win-blend = false # Do not use EWMH to detect fullscreen windows. # Reverts to checking if a window is fullscreen based only on its size and coordinates. # # no-ewmh-fullscreen = false # Dimming bright windows so their brightness doesn't exceed this set value. # Brightness of a window is estimated by averaging all pixels in the window, # so this could comes with a performance hit. # Setting this to 1.0 disables this behaviour. Requires --use-damage to be disabled. (default: 1.0) # # max-brightness = 1.0 # Make transparent windows clip other windows like non-transparent windows do, # instead of blending on top of them. # # transparent-clipping = false # Specify a list of conditions of windows that should never have transparent # clipping applied. Useful for screenshot tools, where you need to be able to # see through transparent parts of the window. # # transparent-clipping-exclude = [] # Set the log level. Possible values are: # "trace", "debug", "info", "warn", "error" # in increasing level of importance. Case doesn't matter. # If using the "TRACE" log level, it's better to log into a file # using *--log-file*, since it can generate a huge stream of logs. # # log-level = "debug" log-level = "warn"; # Set the log file. # If *--log-file* is never specified, logs will be written to stderr. # Otherwise, logs will to written to the given file, though some of the early # logs might still be written to the stderr. # When setting this option from the config file, it is recommended to use an absolute path. # # log-file = "/path/to/your/log/file" # Show all X errors (for debugging) # show-all-xerrors = false # Write process ID to a file. # write-pid-path = "/path/to/your/log/file" # Window type settings # # 'WINDOW_TYPE' is one of the 15 window types defined in EWMH standard: # "unknown", "desktop", "dock", "toolbar", "menu", "utility", # "splash", "dialog", "normal", "dropdown_menu", "popup_menu", # "tooltip", "notification", "combo", and "dnd". # # Following per window-type options are available: :: # # fade, shadow::: # Controls window-type-specific shadow and fade settings. # # opacity::: # Controls default opacity of the window type. # # focus::: # Controls whether the window of this type is to be always considered focused. # (By default, all window types except "normal" and "dialog" has this on.) # # full-shadow::: # Controls whether shadow is drawn under the parts of the window that you # normally won't be able to see. Useful when the window has parts of it # transparent, and you want shadows in those areas. # # clip-shadow-above::: # Controls whether shadows that would have been drawn above the window should # be clipped. Useful for dock windows that should have no shadow painted on top. # # redir-ignore::: # Controls whether this type of windows should cause screen to become # redirected again after been unredirected. If you have unredir-if-possible # set, and doesn't want certain window to cause unnecessary screen redirection, # you can set this to `true`. # wintypes = { dnd = { shadow = false; } dock = { shadow = false; clip-shadow-above = true; } dropdown_menu = { opacity = 0.8; } tooltip = { fade = true; shadow = true; opacity = 0.75; focus = true; full-shadow = false; }; popup_menu = { shadow = false; opacity = 0.9; } utility = { shadow = false; } }; ```

Steps of reproduction

Start picom with XMonad
Keep the computer on for 12+ hours

Expected behavior

Running picom for a long time should not cause any performance degradation.

Current Behavior

Quite significant performance hit. See video below.

https://github.com/yshui/picom/assets/10833976/e643a00f-8a90-4e86-b830-ce64101be4a7

Other details

As shown in the video above, once the stuttering happens, simply restarting picom does not resolve the issue. When I restart picom, the issue comes back.

I noticed that when I enable picom, the Xorg process takes up an entire core in my machine.

Before:

After:

Thanks in advance!

yshui commented 2 weeks ago

hmm, this is interesting. i've never seen this myself, so i think you might need to do some investigation yourself.

also i am curious why your first instinct is too report issue here, instead of to Xorg?

RangHo commented 2 weeks ago

so i think you might need to do some investigation yourself.

Yeah, that was my hunch as well. I'm not really sure where to begin, so if you could give me some initial pointers it would be nice.

why your first instinct is too report issue here, instead of to Xorg?

Killing picom immediately resolves the stuttering issue and other compositor (read: xcompmgr) does not cause the slowdown. Also, when I restart picom with all three backends, Xorg starts hogging the CPU again, until I do a full reboot. For this I assumed picom is somehow causing Xorg to go crazy.

I'll do some more investigations (not really sure how yet), and report to Xorg if this seems unrelated to picom!

yshui commented 2 weeks ago

there are 2 things i can think of:

find out what Xorg is doing. perf will tell you where Xorg is spending its time, but you would need to build Xorg with debug info, or at least have an unstripped Xorg binary.
find out what picom is doing. here i think the most obvious thing to find out is if picom is making a lot of requests to Xorg. you can use xtrace to see that.

RangHo commented 2 weeks ago

Thanks! I'll try some things once I get back home and report back.

evilsh3ll commented 1 week ago

I've the same problem, I can't solve it

RangHo commented 1 week ago

Alright, here's something I've found. It happened much faster this time. Current uptime is 2 hours and 36 minutes.

I've installed the debugging symbols for Xorg (via xorg-server-dbg package on Void Linux) after this happened but I don't think the system has picked it up. Maybe I need to recompile Xorg?

Anyways, I've done some rudimentary performance analysis:

perf stat -p $XSERVER_PID, without picom

Performance counter stats for process id '1091':

          1,020.83 msec task-clock                       #    0.161 CPUs utilized
            11,690      context-switches                 #   11.452 K/sec
                 6      cpu-migrations                   #    5.878 /sec
                 0      page-faults                      #    0.000 /sec
     2,570,710,565      cycles                           #    2.518 GHz
       438,018,600      stalled-cycles-frontend          #   17.04% frontend cycles idle
       526,678,467      instructions                     #    0.20  insn per cycle
                                                  #    0.83  stalled cycles per insn
       111,380,155      branches                         #  109.108 M/sec
         9,297,155      branch-misses                    #    8.35% of all branches

       6.327497612 seconds time elapsed

Same thing, with picom

Performance counter stats for process id '1091':

          6,777.46 msec task-clock                       #    0.861 CPUs utilized
             5,882      context-switches                 #  867.876 /sec
                17      cpu-migrations                   #    2.508 /sec
                 2      page-faults                      #    0.295 /sec
    29,572,271,502      cycles                           #    4.363 GHz
       361,153,529      stalled-cycles-frontend          #    1.22% frontend cycles idle
    22,856,560,430      instructions                     #    0.77  insn per cycle
                                                  #    0.02  stalled cycles per insn
     7,537,230,525      branches                         #    1.112 G/sec
         7,257,721      branch-misses                    #    0.10% of all branches

       7.869414901 seconds time elapsed

I noticed a metric tonne of instructions after picom. Running strace gives some insight (I haven't compiled xtrace yet).

When picom is running, I see walls of calls to epoll_ctl syscall and the corresponding SIGALRM. It looks something like:

epoll_ctl(3, EPOLL_CTL_MOD, 7, {events=EPOLLET, data={u32=4006362416, u64=94536236547376}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 35, {events=EPOLLET, data={u32=4006364576, u64=94536236549536}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 36, {events=EPOLLET, data={u32=4005857440, u64=94536236042400}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 37, {events=EPOLLET, data={u32=4002559472, u64=94536232744432}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 38, {events=EPOLLET, data={u32=4005550416, u64=94536235735376}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 39, {events=EPOLLET, data={u32=4001652000, u64=94536231836960}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 40, {events=EPOLLIN|EPOLLET, data={u32=4026598736, u64=94536256783696}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 41, {events=EPOLLET, data={u32=4001651856, u64=94536231836816}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 42, {events=EPOLLET, data={u32=4001608032, u64=94536231792992}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 43, {events=EPOLLET, data={u32=3991550560, u64=94536221735520}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 44, {events=EPOLLET, data={u32=3991586736, u64=94536221771696}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 47, {events=EPOLLIN|EPOLLET, data={u32=3991555536, u64=94536221740496}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 48, {events=EPOLLET, data={u32=4005882496, u64=94536236067456}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 51, {events=EPOLLET, data={u32=4004403568, u64=94536234588528}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 46, {events=EPOLLET, data={u32=4003920512, u64=94536234105472}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 50, {events=EPOLLET, data={u32=4014603744, u64=94536244788704}}) = 0
writev(40, [{iov_base="\1\1\221}\0\0\0\0\3\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", iov_len=32}], 1) = 32
recvmsg(40, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}, NULL) = 0
epoll_wait(3, [{events=EPOLLIN, data={u32=4026598736, u64=94536256783696}}], 256, 118896) = 1
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=5000}, it_value={tv_sec=0, tv_usec=5000}}, NULL) = 0
recvmsg(40, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="+\0\1\0", iov_len=16384}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 4
writev(40, [{iov_base="\1\1\222}\0\0\0\0\3\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", iov_len=32}], 1) = 32
recvmsg(40, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}, NULL) = 0
epoll_wait(3, [{events=EPOLLIN, data={u32=4026598736, u64=94536256783696}}], 256, 118896) = 1
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=5000}, it_value={tv_sec=0, tv_usec=5000}}, NULL) = 0
recvmsg(40, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="%\0\1\0+\0\1\0", iov_len=16384}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 8
epoll_ctl(3, EPOLL_CTL_MOD, 7, {events=EPOLLIN|EPOLLET, data={u32=4006362416, u64=94536236547376}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 35, {events=EPOLLIN|EPOLLET, data={u32=4006364576, u64=94536236549536}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 36, {events=EPOLLIN|EPOLLET, data={u32=4005857440, u64=94536236042400}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 37, {events=EPOLLIN|EPOLLET, data={u32=4002559472, u64=94536232744432}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 38, {events=EPOLLIN|EPOLLET, data={u32=4005550416, u64=94536235735376}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 39, {events=EPOLLIN|EPOLLET, data={u32=4001652000, u64=94536231836960}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 40, {events=EPOLLIN|EPOLLET, data={u32=4026598736, u64=94536256783696}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 41, {events=EPOLLIN|EPOLLET, data={u32=4001651856, u64=94536231836816}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 42, {events=EPOLLIN|EPOLLET, data={u32=4001608032, u64=94536231792992}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 43, {events=EPOLLIN|EPOLLET, data={u32=3991550560, u64=94536221735520}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 44, {events=EPOLLIN|EPOLLET, data={u32=3991586736, u64=94536221771696}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 47, {events=EPOLLIN|EPOLLET, data={u32=3991555536, u64=94536221740496}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 48, {events=EPOLLIN|EPOLLET, data={u32=4005882496, u64=94536236067456}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 51, {events=EPOLLIN|EPOLLET, data={u32=4004403568, u64=94536234588528}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 46, {events=EPOLLIN|EPOLLET, data={u32=4003920512, u64=94536234105472}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 50, {events=EPOLLIN|EPOLLET, data={u32=4014603744, u64=94536244788704}}) = 0
writev(40, [{iov_base="\1\1\224}\0\0\0\0\3\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", iov_len=32}], 1) = 32
recvmsg(40, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(3, [{events=EPOLLIN, data={u32=4002559472, u64=94536232744432}}], 256, 0) = 1

-- snip --

ioctl(13, DRM_IOCTL_CRTC_GET_SEQUENCE, 0x7ffd63489fe0) = 0
ioctl(13, DRM_IOCTL_MODE_ADDFB, 0x7ffd63489dd0) = 0
ioctl(13, DRM_IOCTL_MODE_PAGE_FLIP, 0x7ffd63489ec0) = 0
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536247699904
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536260975616
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536251209776
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536257091824
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536255210768
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536251730752
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536257111536
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536257956352
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536248654528
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536243286208
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536244652112
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536251736736
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536256522160
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536255844208
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536256439440
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536251633200
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536256485904
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536255930096
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536251735680
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536251826144
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536261732944
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536251207312
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536261668528
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536255914608
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536254652848
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536257684608
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536255180496
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536257913760
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536256174032
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536254793296
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536255182608
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536260896416
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536255817808
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536261247360
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536261753712
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536256206064
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536261708304
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536255998736
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536256093776
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536256606640
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 94536256194800

Not sure where this is coming from.

I'll do a quick reboot to see if this is normal behavior.

RangHo commented 1 week ago

Rebooted. Now the issue is gone, so I might need to wait unknown amount of time to reproduce :(

Either way, ran strace again. The epoll_ctl is normal, but the walls of SIGALRM is not. It should not register that many alarms, and other compositors such as xcompmgr does not.

I'm really confused. Since Xorg process is the one receiving the signals, so surely Xorg process is the one registered the alarms. Yet somehow running picom is causing this behavior?

For the full log: refer to these files. For all traces, I just focused another terminal and typed random string for several seconds, then pressed ^C to stop tracing.

nothing.log: before reboot, no compositor running.
picom.log: before reboot, picom running.
xcompmgr.log: before reboot, xcompmgr running.
reboot.log: after reboot, picom running.

yshui commented 1 week ago

perf stat -p $XSERVER_PID, without picom

You need to perf record/perf report, or perf top, which will give you function names

yshui commented 1 week ago

I don't see anything out of the ordinary so far, can you run xtrace?

RangHo commented 1 week ago

I haven't been able to reporduce the slowdown yet. I'll report back soon!

mattiasflodin commented 1 week ago

I am getting the same symptoms after upgrading to NixOS 24.05 recently (picom v11). There are two picom processes (or possibly threads) that each use 100% CPU, and also systemd-journald. Looking at the journal, picom is producing millions of log lines like this

jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287
jul 11 20:54:49 x470 picom[1647]: [ 2024-07-11 20:54:49.092 _gl_compose ERROR ] GL error at line 487: 1287

I estimate about 50-100 of these per millisecond. Like in the original post, this happens after the computer has been on for some time. Perhaps it has to do with putting the screen in sleep mode.

Unlike the OP, this is on Nvidia hardware.

$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 10240 MB
    Total available memory: 10240 MB
    Currently available dedicated video memory: 9218 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: NVIDIA GeForce RTX 3080/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 550.78
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6.0 NVIDIA 550.78
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)

OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 550.78
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

mattiasflodin commented 1 week ago

I can see that gl_check_err is only called at the end of _gl_compose, near line 487. So it would seem that _gl_compose gets called a lot and that something fails on each call. Not sure that logging every single one of these is very helpful as I now have over ten million such log lines in my journal - perhaps it's better to give up and crash picom. But the root cause is probably elsewhere - something in the GL context or graphics card state changes so that some call in _gl_compose starts failing.

yshui commented 1 week ago

@RangHo can you confirm what @mattiasflodin has found?

RangHo commented 1 week ago

@yshui Unfortunatly not yet. I've been trying to replicate it a couple times so far, but I haven't been able to. I have volatile syslog setup as well, so I don't have previous journals, unfortunately.

Will report once reproduced.

tx46 commented 5 days ago

happens to me every day now since I leave my system running. killing picom and restarting it fixes it (until i leave my system idle for a few hours again)

ColetteDiskette commented 5 days ago

I'm dealing with this, too. I actually made 'Kill Picom' and 'Start Picom' buttons on my Stream Deck because of how often I have to restart it.

Don't have much to add beyond that, though. I'm on Arch Linux running AwesomeWM, an AMD Ryzen 7 5700X, and an AMD Radeon RX 7600.

yshui commented 4 days ago

if you have this problem it would help if you can do the steps i listed in: https://github.com/yshui/picom/issues/1289#issuecomment-2212943014

RangHo commented 4 days ago

I was just able to reproduce the symptom. It started really early (uptime was ~53 min), and was able to gather some information. I'm having a stronger suspicion that this is a problem in the upstream, though. More about that later.

Regarding what @mattiasflodin reported, in my picom log, I do not see anything related to GL errors. There is no severe warnings at all, in fact. I assume that GL error is a different issue from what I am experiencing.

Regarding what @m8uz and @ColetteDiskette reported, I also think that is a different issue from this one, as in their case restarting picom seems to solve the issue. In my case, restarting picom just re-introduces the massive CPU lag.

This is the screenshot of perf top -p $XORG_PID:

picom

Xorg is spending most of its time in (I guess this one?) damageSetWindowPixmap function, which kinda explains why starting picom causes this. At the same time, I'm really not sure why other compositors experience the same. The "damage event" is generated by Xorg, no?

I attached xtrace to picom to see what happens, and from what I could eyeball, and I see a lot of damage requests, which is in line with what perf tells us.

The full, MASSIVE log is here: xtrace.log (it is like 24 megs).

By the way, the debug symbols were retrieved from the latest Void Linux packages (xorg-server-dbg, picom-dbg, respectively) and xtrace was built from the latest Debian package (see build template for XBPS here).

Should I bring this to upstream? Or is this a picom issue?

yshui commented 4 days ago

@RangHo Thanks! This is very helpful, I will try to investigate a little bit with information but I might need more.

@mattiasflodin looks like your issue is unrelated, can you open a new issue for that? thanks.

yshui commented 4 days ago

Something on your system is creating an enormous number of 1x1 sized windows. picom needs to monitor damage for every window, that's why you are seeing damageSetWindowPixmap in perf.

Looks like this is not a picom problem, but I suggest you look into what's creating all these windows.

yshui commented 4 days ago

Yeah, something (hopefully not picom) created 17384 1x1 windows... That's weird.

If you search in the xtrace.log for Create damage, you will see lines like:

000:<:10885: 16: DAMAGE-Request(143,1): Create damage=0x00c02c0b drawable=0x00800ad9 level=report non-empty(0x03)

drawable is the window ID, if you search for GetGeometry drawable=<window ID> (in this case, window ID is 0x00800ad9):

000:<:10883:  8: Request(3): GetWindowAttributes window=0x00800ad9
000:>:0883:44: Reply to GetWindowAttributes: backing-store=NotUseful(0x00) visual=0x00000021 class=InputOutput(0x0001) bit-gravity=Forget(0x00) win-gravity=NorthWest(0x01) backing-planes=0xffffffff backing-pixel=0x00000000 save-under=false(0x00) map-is-installed=true(0x01) map-state=Unmapped(0x00) override-redirect=false(0x00) colormap=0x00000020 all-event-masks=0 your-event-mask=0 do-not-propagate-mask=0 unused=0x0000
000:<:10884:  8: Request(14): GetGeometry drawable=0x00800ad9
000:>:0884:32: Reply to GetGeometry: depth=0x18 root=0x000004f6 x=0 y=0 width=1 height=1 border-width=1

If you see width=1 height=1, then this is one of the suspicious windows. Check if you can get anything about it with xwininfo or xprop.

RangHo commented 4 days ago

Holy cow, that is really weird. Since I already restarted the machine and no longer has the issue, I need to wait a while again to reproduce it. I'll try the options you suggested!

On the other hand... while this is an issue unrelated to picom, the procedure I've gone through seems like a good starting point into debugging performance issues in Xorg in general. So I suggest keeping this issue opened until I find the culprit for documentation purposes. How does that sound?

yshui commented 4 days ago

Yeah, sure. I am intrigued as well. Hopefully we don't find picom creating those windows because of a bug somewhere...

mattiasflodin commented 3 days ago

@yshui I'll try to see if I can reproduce it while running in xtrace and then report a separate bug.

ColetteDiskette commented 2 days ago

I think I figured out what's going on for mine, from a high level at least.

When I close a window in AwesomeWM with c:kill() (which is the recommended way in Awesome), the xtrace for picom yields [ 07/20/2024 15:13:14.598 x_log_error WARN ] X error 3 WINDOW request 2 minor 0 serial 17484381 (but with appropriate timestamp and serial, of course.)

For each one of these, the CPU usage slowly goes up, and over time I can watch my temperature climb. Over the course of a day it can go up 15 to 20 C from regular computer usage. Killing picom and starting it back up makes things good again.

Should I start a separate issue for this, or would this be considered an X or maybe even Awesome issue? I don't understand how these things work at a low enough level to know exactly where this would need to go.

RangHo commented 8 hours ago

I was able to reliably reproduce the performance degradation. It was a combination of XMonad's NamedScratchpad and WezTerm. See the video below:

https://github.com/user-attachments/assets/3c3fd5d5-b12b-4c8c-86ff-7f93b36628c8

As you can see, when the state of the WezTerm window changes (position, hidden/visible, visiting another workspace, even typing on it... anything, really), it generates SHIT TON of these 1x1 windows. The performance degradation is clearly visible. Kitty, on the other hand, does not show this behavior.

For now, the workaround is to ditch WezTerm and use something else.

It was hard to locate since these 1x1 windows doesn't show any property on xwininfo and xprop, so I had to just watch -n 1 -- xwininfo -root -tree and observe.

I'll bring this to WezTerm and see what they can do about it. I think it is safe to close this issue as well. For other performance degradation cases, I suggest opening separate issues.

Thanks @yshui for your help!

yshui / picom