yshui / picom

A lightweight compositor for X11 with animation support
https://picom.app/
Other
4.14k stars 588 forks source link

SIGABRT moving cursor between screens #1353

Open scfarley opened 2 weeks ago

scfarley commented 2 weeks ago

I found this when trying to find an occasional SIGABRT in picom. It was happening infrequently but usually when the monitor went to sleep.

It could be the same issue, but with debugging enabled (and no optimization) this happens all the time. I believe that this assertion is not needed as there is a secondary check against cursor following it that does another search for cursor, but I am not sure how the code is supposed to behave. However, removing that first assertion seems to makes things happy. https://github.com/yshui/picom/blob/v12.1/src/event.c#L281

This may or may not be related to the SIGABRT without debugging. Time will tell.

Platform

FreeBSD 14.1-STABLE

GPU, drivers, and screen setup

Multi-screen setup (from xorg.conf):

        Screen      0  "Screen0"
        Screen      1  "Screen1" LeftOf "Screen0"

Mesa version: 24.1.7

glxinfo -B output:

name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 12282 MB
    Total available memory: 12282 MB
    Currently available dedicated video memory: 11112 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: NVIDIA GeForce RTX 4070/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 550.120
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6.0 NVIDIA 550.120
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)

OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 550.120
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

display: :0  screen: 1
direct rendering: Yes
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 12282 MB
    Total available memory: 12282 MB
    Currently available dedicated video memory: 11109 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: NVIDIA GeForce RTX 4070/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 550.120
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6.0 NVIDIA 550.120
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)

OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 550.120
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

Environment

Fluxbox

picom version

$ picom --version
v12.1

Steps of reproduction

  1. Compile picom with debugging:
    -pipe  -g -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing

    Instead of:

    -O2 -pipe  -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing
  2. Start picom on one screen or both screens.
  3. Move cursor from one screen to the other.

Expected behavior

No SIGABRT

Current Behavior

SIGABRT

Stack trace

(lldb) f 4
frame #4: 0x0000000000269f54 picom`recheck_focus((null)=0x00001aae498ac140, req_base=0x00001aae4984a840, reply_or_error=0x00001aae499050e0) at event.c:281:2
   278           }
   279
   280           auto cursor = wm_find(ps->wm, wid);
-> 281           assert(cursor != NULL || !wm_is_consistent(ps->wm));
   282
   283           if (cursor != NULL) {
   284                   cursor = wm_ref_toplevel_of(ps->wm, cursor);
(lldb) p cursor
(wm_ref *) NULL
(lldb) p wid
(xcb_window_t) 37748749

Note that the window ID is from the window focused on the screen the cursor is leaving.

yshui commented 2 weeks ago

multiple screen setup is not actively being supported by picom. many parts of picom assumes there is only one X screen. if removing the assertion makes it work, then i suggest you just build without the assertions (i.e. -DNDEBUG, or meson configure -Dbuildtype=release).

scfarley commented 2 weeks ago

I could try that, but what is the purpose of this assertion followed immediately by checking if cursor is NULL then setting cursor again and performing that same assertion?

What about removing the cursor != NULL from the first assertion? The second assertion will check it.

yshui commented 2 weeks ago

the assertion isn't checking if cursor is NULL:

assert(cursor != NULL || !wm_is_consistent(ps->wm));

this means: cursor shouldn't be NULL, unless wm_is_consistent is false.

scfarley commented 2 weeks ago

Obviously, I need this:

assert(!insomnia || coffee != NULL);

I wish picom did actively support multiple screens. I would prefer to keep assertions to help provide information, at least when valid. :grin: For now, I will just remove the first assertion line.

This issue can be kept open for whomever wants to work on the code handling multiple screens or not. I am good for now with my local change.

scfarley commented 2 weeks ago

After removing the assert(), I was able to find the original SIGABRT. I think this is happening when the monitor that was active when I locked the monitors in xscreensaver actually goes to sleep. This is the patch I am running that prevents the issue, but I am not sure it is the best course of action:

--- src/wm/wm.c.orig    2024-09-28 23:22:13 UTC
+++ src/wm/wm.c
@@ -404,7 +404,13 @@ void wm_destroy(struct wm *wm, xcb_window_t wid) {

 void wm_destroy(struct wm *wm, xcb_window_t wid) {
    struct wm_tree_node *node = wm_tree_find(&wm->tree, wid);
-   BUG_ON_NULL(node);
+
+   // Skipping destruction of window that cannot be found.  This occurs
+   // after the monitor goes to sleep.
+   if (node == NULL) {
+       log_debug("Destroying window %#010x ignored", wid);
+       return;
+   }

    log_debug("Destroying window %#010x", wid);
yshui commented 1 week ago

i think you can just keep running with that patch. the only real "fix" would be to systematically go through picom and fix every single case where we assumed single screen. not sure if that would be worth it.

you are probably the second person i've seen that uses a mult-screen setup for the entirety of picom's existence :sweat_smile:

scfarley commented 1 week ago

i think you can just keep running with that patch. the only real "fix" would be to systematically go through picom and fix every single case where we assumed single screen. not sure if that would be worth it.

It is not a problem for me since I build all software locally. The patch will live in the local ports tree.

To satisfy my curiosity, how would picom be seeing my multi-screen setup? Since it runs as two separate screens (:0.0 and :0.1), I would think it would be ignorant of the other screens, which is why I run two instances of picom. I would think in this case (picom's) ignorance would be bliss. The only thing that I think is confusing picom would be the "disappearing" cursor when it changes screens.

I have been trying my hand at writing a simple systray app using xcb. Except for finding the screen at the beginning, it stays intentionally ignorant about how many screens there are. This is some condensed code that I have from it:

int screenNum;
xcb_connection_t *conn;
const xcb_setup_t *xcbSetup;
xcb_screen_iterator_t sIter;

conn = xcb_connect(NULL, &screenNum);

// Gather information about this display and particular screen.
xcbSetup = xcb_get_setup(conn);
sIter = xcb_setup_roots_iterator(xcbSetup);
for (int i = 0; i < screenNum; i++)
{
    xcb_screen_next(&sIter);
}

However, https://stackoverflow.com/a/73515847 looks like it has more checks than I have.

you are probably the second person i've seen that uses a mult-screen setup for the entirety of picom's existence 😅

I never take the easy path. LOL BTW, when you say multi-screen, you do not mean multiple monitors viewed as one screen, yes? Just making sure as I confuse myself sometimes.

yshui commented 1 week ago

I never take the easy path. LOL BTW, when you say multi-screen, you do not mean multiple monitors viewed as one screen, yes? Just making sure as I confuse myself sometimes.

right, i mean multiple X screens like :0.0, :0.1, etc., not like, e.g. xrandr.

yshui commented 1 week ago

I would think it would be ignorant of the other screens, which is why I run two instances of picom.

I think in many places we just assume the screen number is 0 (i.e. the default screen). I was pretty surprised this works at all tbh :sweat_smile:

scfarley commented 1 week ago

I would think it would be ignorant of the other screens, which is why I run two instances of picom.

I think in many places we just assume the screen number is 0 (i.e. the default screen). I was pretty surprised this works at all tbh 😅

I think I understand better. Some of the confusion, for me, is there is a mix of Xlib and XCB calls such as with XOpenDisplay() vs xcb_connect(). Still, it looks like picom is not caring much about the display name since it uses XOpenDisplay(NULL) which will be whatever the value of DISPLAY is. x_connection_init() also uses DefaultScreen(), so it going with whatever screen is associated with that display. That screen value is used in a few other calls. It is almost as if picom is very close to being screen-agnostic as-is.

Would you be able to point at a line of code as an example of where picom assumes it is :0.0? That way I can see what to look for. I may way to play with cleaning up those assumptions.

One thing that may be helping it to "work" is that my monitors are nearly the same with only the refresh speeds being different.

Thank you for your work on picom and time explaining things to me.

P.S. I just noticed where you work. Since you work with Wine (a tiny bit :grin:), would you happen to know an easy way to tell Wine to set the _NET_WM_BYPASS_COMPOSITOR property? Otherwise, I am going to have to do some wicked scripting to use xdotool and xprop to set the property.