swaywm / sway

i3-compatible Wayland compositor
https://swaywm.org
MIT License
14.57k stars 1.11k forks source link

Sway fails to release GPU on udev remove command #8097

Open redrampage opened 6 months ago

redrampage commented 6 months ago

Hi, I'm running into problems with swaywm releasing videocard DRI device.

I'm running sway on two AMD GPU virtualization setup, one is for host system (card0), other one is dynamically plugged/unplugged for guest(card1). They are configured on start via WLR_DRM_DEVICES=/dev/dri/card0:/dev/dri/card1.

When I need to release latter GPU for guest VM, I run udev command:

/usr/bin/udevadm trigger --settle --type=devices --action=remove --subsystem-match=drm --property-match="DEVNAME=/dev/dri/card1"

and then check for GPU release with:

fuser /dev/dri/card1

Some time ago this trick stopped working, now on udev command sway shutdown second display, but still keep it's device open. Log messages indicate that sway cannot release GPU for some reason:

Mar 30 18:15:50 hostname sway[192985]: 00:00:51.614 [ERROR] [wlr] [libseat] [libseat/backend/logind.c:199] Could not close device: Device not taken
Mar 30 18:15:50 hostname sway[192985]: 00:00:51.614 [ERROR] [wlr] [backend/session/session.c:356] Failed to close device 12: Resource temporarily unavailable

TLDR Steps to reproduce:

# export WLR_DRM_DEVICES=/dev/dri/card0:/dev/dri/card1

# sway

# fuser /dev/dri/card1
/dev/dri/card1:          1  1511 206814

# /usr/bin/udevadm trigger --settle --type=devices --action=remove --subsystem-match=drm --property-match="DEVNAME=/dev/dri/card1"

# fuser /dev/dri/card1
/dev/dri/card1:      206814
emersion commented 6 months ago

That's expected: WLR_DRM_DEVICES disables hotplug and unplug handling.

redrampage commented 6 months ago

Are you sure about that? Sway still reacts on udev command and disables output, this worked before and there is error message now (I've checked old logs).

I've tried to run sway without WLR_DRM_DEVICES set, on aforementioned udev command it disabled both outputs, but still kept both dri cards open.

emersion commented 6 months ago

Hm, yeah, nevermind, we still listen for the udev remove event in that case.

cc @kennylevinsen for the libseat bits

kennylevinsen commented 6 months ago
Mar 30 18:15:50 hostname sway[192985]: 00:00:51.614 [ERROR] [wlr] [libseat] [libseat/backend/logind.c:199] Could not close device: Device not taken
Mar 30 18:15:50 hostname sway[192985]: 00:00:51.614 [ERROR] [wlr] [backend/session/session.c:356] Failed to close device 12: Resource temporarily unavailable

I suspect these errors are just because logind reacted to the event in parallel, removing the devices from the seat before we tried to clean them up. Neither libseat nor wlroots considers them fatal and proceeds with cleanup.

My gut-feeling would be the renderer's fd, which is a lease off the master fd. I imagine it's invalidated when the master status is lost, but that does not close it.

emersion commented 6 months ago

But a renderer is only created for the primary DRM device, not for secondary devices?