pop-os / cosmic-comp

Compositor for the COSMIC desktop environment
GNU General Public License v3.0
464 stars 80 forks source link

(Bug) Non-deterministic deadlock #632

Open hardBSDk opened 1 month ago

hardBSDk commented 1 month ago

Related to https://github.com/pop-os/cosmic-comp/issues/625 and https://github.com/pop-os/cosmic-comp/issues/628

I confirmed that this bug happens in any situation and execution time.

leviport commented 1 month ago

Is this not just a duplicate of #628 ?

hardBSDk commented 1 month ago

@leviport Almost, because the other issue has determinism (long time execution).

But this bug can happen at any time of the execution.

ids1024 commented 1 month ago

Freezing "after 1-2 days" could be a non-deterministic thing that tends to happen in that time frame.

In any case, it's hard to do much with this information. I'm not seeing deadlocks like this, anyway, except much more occasionally. Not sure exactly what's different on your system or with how you're using it.

It's a little involved, but personally I'd try to use gdb and attach to the cosmic-comp process.

I should probably look into if there are better tools for detecting / debugging deadlocks in Rust applications. (And document some of the debugging tools that are useful with Cosmic)

hardBSDk commented 1 month ago

@ids1024 My workload before this freeze was 5-7 running programs sending notifications after some seconds or minutes, I was listening to music, while receiving/sending messages on chat programs, changing the window focus many times and suspending.

curiousercreative commented 1 month ago

@hardBSDk I've encountered similar (see thread) and found that I've only encountered the deadlock when Chromium (flatpak, Wayland enabled) is the active window and being interacted with (mouse click or scroll).

hardBSDk commented 1 month ago

@curiousercreative Warn here if some developer find the deadlock and send the commit link.

ids1024 commented 1 month ago

only encountered the deadlock when Chromium (flatpak, Wayland enabled) is the active window and being interacted with (mouse click or scroll).

Interesting. Maybe a deadlock in the pointer handling code in Smithay.

I guess if I can reproduce it with Chromium, I can get a backtrace. (With a method like what I mentioned above.)

ids1024 commented 1 month ago

Oh, and Chromium is something that (at least by default) seems to still need XWayland, so it could be XWayland related.

curiousercreative commented 1 month ago

Oh, and Chromium is something that (at least by default) seems to still need XWayland, so it could be XWayland related.

I believe with Wayland flags, it doesn't run XWayland.

ids1024 commented 1 month ago

Ah, I didn't see you mentioned that. So flatpak run org.chromium.Chromium --enable-features=UseOzonePlatform --ozone-platform=wayland? I'll see if I can get a deadlock with that. Browsers do some kind of weird stuff, so it's not a surprise that the Chromium Wayland backend could hit some edge cases in the Wayland protocol we don't test well enough otherwise.

hardBSDk commented 1 month ago

@ids1024 It's probable, because I got the deadlock using Element Desktop, which uses Electron, which uses Chromium.

ids1024 commented 1 month ago

Ah, so there may be some trend to what's causing this. And was element using XWayland, or native Wayland? Electron should also default to XWayland without --enable-features=UseOzonePlatform --ozone-platform=wayland, though maybe Element configures it to use the Wayland backend.

curiousercreative commented 1 month ago

@ids1024 FWIW, I don't encounter the deadlock often enough with any reliable repro steps. I've been busy in Chromium this week without a deadlock for example