heftig / rtkit

Branched from git://git.0pointer.net/rtkit
Other
47 stars 20 forks source link

Add Graceful System-Suspend Support #35

Open arthurt opened 1 year ago

arthurt commented 1 year ago

This PR address the issue of canary false-positives cased by system-suspend by adding new public methods of "Suspend()" and "Resume()", as well as optionally connecting them to logind system-sleep handling.

The Bug

During a system suspend-resume (sleep) cycle, the canary thread often experiences a time jump which causes a starvation false-positive. rtkit takes action and demotes the realtime/high priority of all known threads.

Long running realtime processes (Pipewire, Pulseaudio) generally only request realtime/high priority once. If a system goes to sleep, the realtime/high priority scheduling is lost until these long-running processes are next started, after logout and login. As users generally suspend their machines more often than logging in, rtkit is basically non-functional for these processes, arguably the most important processes to use rtkit.

Even non-long-running processes may have lifecycles which span system suspend-resume cycles, and so operate in a degraded way for users.

See

Why

With the view that the primary bug this change seeks to address is the canary false positives, it would seem to be far simpler to only start and stop use of the the canary during suspend. However, doing so would degrade security for a controllable window. From a security perspective, one might as well just disable the canary altogether. To safely disable the canary, we need to first demote all threads.

Suspend/Resume Operation

Two new admin operations are added to rtkit.

These temporarily demote and restore managed thread priorities, as well as stop and start the canary.

On Suspend(), all managed threads are demoted, and the canary stopped.

While suspended, new realtime/high priority requests are rejected. Managed thread states are still garbage if a thread exists, but are retained otherwise.

On Resume() the canary is restarted, and all managed threads are re-promoted. Current user burst limit timeouts are restarted, and the re-promotion of threads counts toward burst limiting, but the burst limit is not enforced on the re-promotion.

Calling ResetKnown() or ResetAll() while suspended removes all managed threads which lack realtime/high-priority, leaving no threads to re-promote later.

Calling either Suspend() and Resume() multiple times in a row is fine, but only the first call has an effect.

Security Considerations

Suspend() and Resume() are only available to admin callers, preventing abuse. Notwithstanding, if a malicious user was able to call suspend and resume at will, they still could not circumvent the count or burst limits. No new threads promotions can be created when suspended. Further, while the user burst limit is not enforced on resume, it is still updated, and the burst timeout restarted.

It may be safe to allow for new realtime/high priority grants while in suspended mode to take effect upon resume, but this is an unlikely case, so it's easier to just refuse.

logind Integration

This change also adds an optional runtime integration with logind's inhibitor locks for handling system-suspend.

If the logind dbus service is running and accessible, rtkit will register a "delay sleep inhibitor", and listen for signals from logind about when the system is going to sleep or having just woken up. Using the sleep inhibitor, logind will wait for rtkit to perform it's Suspend() operation before letting the system suspend. On system resume, logind will again notify rtkit, which will perform Resume() and register a new inhibitor.

See https://www.freedesktop.org/wiki/Software/systemd/inhibit/

Alternate Integrations

No alternate automatic system-suspend integration is provided, but rtkitctl --suspend and rtkitctl --resume should make this task easy.

Other Changes

aviallon commented 4 days ago

I wonder if rtkit should just be forked into an organization somewhere on Freedesktop.org's GitLab, just so we can keep on using it.