Open keszybz opened 1 month ago
While this is indeed desirable, using systemd in this way would break our current privilege model, as we pass file descriptors around to not expose privileged services on the file system. This is the main reason everything is started by cosmic-session
in the first place. Afaik this is not something supported through e.g. systemd-run.
Clarification: systemd-run is generally not the right option for this kind of thing. It is a command-line tool that is a thin wrapper around the relavant APIs exposed by the manager. Those APIs are stable and documented and should be called directly.
Systemd supports storage and passing of sockets very much. I'm not sure which one would be the most relevant here, and I don't want to describe all the possibilities. Can you describe or point me to some overview of how (and why) COSMIC is passing those fds?
Clarification: systemd-run is generally not the right option for this kind of thing. It is a command-line tool that is a thin wrapper around the relavant APIs exposed by the manager. Those APIs are stable and documented and should be called directly.
Noted, however I was expecting the systemd-run
tool to match the underlying api surface in capabilities and I found no way of passing existing file descriptors
Systemd supports storage and passing of sockets very much. I'm not sure which one would be the most relevant here, and I don't want to describe all the possibilities. Can you describe or point me to some overview of how (and why) COSMIC is passing those fds?
cosmic-session
is sharing a pipe with cosmic-comp
.
It uses that for a bit of initial communication, e.g. to share the WAYLAND_DISPLAY
and DISPLAY
settings upon startup, which are then imported into the systemd user session (if available) and set on newly spawned processes.
Additionally a specific command from cosmic-session
on that particular connection, will cause cosmic-comp
to create a new pipe, attach one end as a wayland-client and pass the other back to cosmic-session
over the previously established connection. Connections created this way have additional privileges in the form of additionally exposed protocols or altered behaviour existing protocols and are meant for shell-clients. cosmic-session
will pass these to the various component it starts via forking and setting WAYLAND_SOCKET
.
Additional pipe are passed via forking to relay notifications between a few components and cosmic-panel
does a similar thing internally, where it passes wayland-sockets of it's internal compositor (as every applet is a wayland-client itself) via WAYLAND_SOCKET
.
cosmic-session
is sharing a pipe withcosmic-comp
.
If cosmic-session
and cosmic-comp
are very closely connected, maybe they should be parts of the same unit. In particular, can either be restarted without the other one going down?
a specific command from
cosmic-session
on that particular connection, will causecosmic-comp
to create a new pipe, attach one end as a wayland-client and pass the other back tocosmic-session
over the previously established connection
That is an interesting problem.
One option is to use systemd "scopes". (In systemd parlance, a "service" is a unit that is spawned as a child of systemd, and a "scope" is a unit that that is spawned by something else and then the unit is created based on a list of PIDs and those processes are moved to a new cgroup and managed by systemd.) This would mean that cosmic-session
would spawn the child as it does now, and then call org.freedesktop.systemd1.Manager.StartTransientUnit
with PIDs
property set. See https://systemd.io/CONTROL_GROUP_INTERFACE/#properties.
Another option would be to extend the systemd API to support this better. We have APIs that cover stdin/stdout/stderr, but not other fds. See https://github.com/systemd/systemd/issues/33061.
If
cosmic-session
andcosmic-comp
are very closely connected, maybe they should be parts of the same unit. In particular, can either be restarted without the other one going down?
The compositor can indeed be restarted and cosmic-session will restart the shell-components in the right order to restore the session. This isn't meant as a way for arbitrary restart though and more of a way to handle crashes and/or aid development/debugging. (At least until something like KDE/Qt's proposed compositor-handoff becomes a more widely supported thing.)
One option is to use systemd "scopes". (In systemd parlance, a "service" is a unit that is spawned as a child of systemd, and a "scope" is a unit that that is spawned by something else and then the unit is created based on a list of PIDs and those processes are moved to a new cgroup and managed by systemd.) This would mean that
cosmic-session
would spawn the child as it does now, and then callorg.freedesktop.systemd1.Manager.StartTransientUnit
withPIDs
property set. See https://systemd.io/CONTROL_GROUP_INTERFACE/#properties.
This would probably be the easiest option to support right now and also seems to make it easy to keep the systemd-integration optional. As far as I understand it, if systemd is available we should just add a call to create the scope?
This gains us proper cgroups and the perks that come with that, while still allowing cosmic-session to manage the child process, right? (Most notably watch it's exit status and start it again if necessary with a new socket.)
Another option would be to extend the systemd API to support this better. We have APIs that cover stdin/stdout/stderr, but not other fds. See systemd/systemd#33061.
I am not sure how exactly that would work, but it is definitely interesting to think about. I would like to move more responsibilities to an existing service manager opposed to essentially writing our own. This has only really developed out of the necessity to transfer file descriptors to our child processes.
So in that scenario we would pass file descriptors to the systemd-call, I assume? And systemd would make sure to pass them, when starting the process? I guess we would need to figure out how restarting would exactly work in that case.
This would probably be the easiest option to support right now and also seems to make it easy to keep the systemd-integration optional. As far as I understand it, if systemd is available we should just add a call to create the scope?
This gains us proper cgroups and the perks that come with that, while still allowing cosmic-session to manage the child process, right? (Most notably watch it's exit status and start it again if necessary with a new socket.)
Yes.
So in that scenario we would pass file descriptors to the systemd-call, I assume? And systemd would make sure to pass them, when starting the process? I guess we would need to figure out how restarting would exactly work in that case.
Yes and yes.
For restarting, since a new fd needs to be passed, I think the only option is to stop the previous instance and start a new one from scratch. Normally we want restarts to pass along state, but in this case it sounds like any state becomes invalid during the restart.
Yes.
Great. Thanks for the links and bringing this to my attention. I will see to it as soon as as time permits (likely after alpha 1). In case anybody reading this or you personally see this as an opportunity to contribute, PRs are very welcome.
For restarting, since a new fd needs to be passed, I think the only option is to stop the previous instance and start a new one from scratch. Normally we want restarts to pass along state, but in this case it sounds like any state becomes invalid during the restart.
Makes sense, so in the end that doesn't give us much more than scopes?
I guess we could also rework the api to use some more long-lived anonymous sockets, than a pipe, so we could re-use the same fd?
I don't think such an api exists however, I guess the closest would be an abstract socket and some authentication by sharing a secret with the service..
Not sure if that is worth the effort, at least not in the short term.
Makes sense, so in the end that doesn't give us much more than scopes?
A service gets a clean environment and the possibility to apply sandboxing or other setup. With a scope, this would need to be done by the caller.
I don't think such an api exists however, I guess the closest would be an abstract socket and some authentication by sharing a secret with the service..
Authentication is not useful. All those processes are executed under the same user, so normal linux privilege separation already gives appropriate security properties. There is no real difference between a pipe, a socket in the file system, or something more elaborate like a connection with some form of authentication. Since all the processes are running under the same user, a rogue processes that would be able to open the connection would in most circumstances be able to ptrace the session process and extract any secrets.
If the API used for communication between processes is changed, I guess a UNIX socket with SOCK_SEQPACKET could be a good choice.
Great. Thanks for the links and bringing this to my attention. I will see to it as soon as as time permits (likely after alpha 1). In case anybody reading this or you personally see this as an opportunity to contribute, PRs are very welcome.
Cool! I guess that the design part might be more complicated than the actual code changes…
From the systemd side we'll want to add the API to start services with fds. It's a generic problem and I'm a bit surprised that it hasn't been implemented yet.
I was looking into this a bit, sorry for any dumb questions.
start_component
, and would involve either run_optional_command (like the other systemd related calls), or checking for some env variable to indicate Cgroups should be used?ProcessKey
returned from ProcessManager::start
a PID? If not, is there an existing function in launchpad to get a PID for a process?* I'm guessing the necessary changes (for optionally using Cgroups) would go at the tail end of [`start_component`](https://github.com/pop-os/cosmic-session/blob/5613bc660649c65b4a4c3fb41605491b9765729a/src/main.rs#L332), and would involve either run_optional_command (like the other systemd related calls), or checking for some env variable to indicate Cgroups should be used?
I can't comment on the side of the cosmic code, but just one clarification: the set up of the cgroup and moving of the procesess to the cgroup must be done by systemd, i.e. the code in cosmic must do a DBus call to systemd.
In the approach with scopes, the sequence would be that the process is spawned and then a call is made to move it to a scope, so yeah, somewhere at the end of start_component
sounds right.
from new control group interfaces
CPUAccounting, CPUShares, BlockIOAccounting, BlockIOWeight, BlockIOReadBandwidth, BlockIOWriteBandwidth, BlockIODeviceWeight, MemoryAccounting, MemoryLimit, DevicePolicy, DeviceAllow for services/scopes/slices.
@keszybz I'm trying to figure out what, if any, of these should be exposed or set for spawned scopes, and I was hoping you might have more information.
MemoryAccounting
propagate to all of the spawned scopes if we are not currently (explicitly) grouping them under a slice?DeviceAllow
and DevicePolicy
which I think make sense to expose but I'm not sureAlso, when is it a good Idea to group multiple processes under a single scope?
- CPUAccounting seems to do nothing if cgroup v2 (unified hierarchy) is used, which I think is the default at this point right?
Yes. Anything related to cgroups v1 can be ignored at this point. V1 is on its way out, and by the time this work here goes out to users, systemd will have dropped support for it.
- CPUShares, MemoryLimit, and all BlockIO properties are deprecated (according to same link under history)
- Would
MemoryAccounting
propagate to all of the spawned scopes if we are not currently (explicitly) grouping them under a slice?
Yes, because the spawned scoped would be siblings.
- The only ones left are
DeviceAllow
andDevicePolicy
which I think make sense to expose but I'm not sure
Taking a step back, I don't think that trying to allow-/deny-list which systemd settings can be used is useful. There's just too many settings, and new ones are added very often… Why not just allow any key=value pairs and pass them through? This will reduce you maintanance effort.
Also, when is it a good Idea to group multiple processes under a single scope?
When those processes are part of a single "thing". E.g. a controller process and 5 workers, or a user shell with 'ls' and 'vi' spawned from it…
Initial support for this has been merged: https://github.com/pop-os/cosmic-session/pull/54
I think we can close this issue and open new ones to track progress for making better use of scopes or potentially moving the systemd-services, once new interfaces allow for file descriptor passing.
@keszybz Thanks for starting this, do you mind getting pinged, once we look into further work on this front for some insights? Anything else you would like to see cosmic support in this regard, that I am missing?
[A continuation of the discussion in https://discussion.fedoraproject.org/t/help-wanted-for-fedora-cosmic/106769.]
Hi, I'm an systemd developer and a recent discussion in Fedora prompted me to test COSMIC (precisely ublue-os/cosmic-base:40-amd64) on a spare laptop.
COSMIC currently just puts everything in one giant systemd service, which means that everything ends up in one giant cgroup. This is not good. It’s essentially a regression to where GNOME and KDE were 2 years ago. With a bit of pain, those DEs converted to a scheme where the services and applets that are started by the DE are not spawned directly, but instead through the systemd API which puts the program in a separate service under the user instance of systemd.
Segregating apps into separate systemd user services has the following advantages:
274 can be handled by creating a symlink in the right directory
So… I think COSMIC looks very promising, and I like the flexibility, but I would also love to avoid a regression wrt. to how apps are started. I'm opening the ticket here with the hope of starting a discussion.