jupyterhub / systemdspawner

Spawn JupyterHub single-user notebook servers with systemd
BSD 3-Clause "New" or "Revised" License
92 stars 49 forks source link

Add option to make the singleuser service part of the user slice #32

Closed Debilski closed 1 year ago

Debilski commented 6 years ago

Possibly something like adding this to systemd-run

cmd.extend(['--slice', 'user-{uid).slice'.format(uid=pwnam.pw_uid)])

would do. Having the service in the user slice could help with resource accounting when users are additionally using the server through ssh. I don’t know if there are other side effects to be aware of, so it should probably only be an option.

Debilski commented 6 years ago

Adding to the slice might not be enough. Logging in and out of ssh will close the session (pam?).

clhedrick commented 5 years ago

I'm am doing this now. I create a user session and set a memory limit per session and fair-share scheduling per user. Obviously you can adjust the policies as you like. If you're going to make this a feature of the system you'll want to move the dbus call into python code. There are examples of how to do this.

This assumes you're using logind, which you are if you're using systemd. Also, Ubuntu 14 uses logind without using systemd. If you're not using logind then you'd need to manipulate the cgroups directly.

import asyncio
from jupyterhub.spawner import LocalProcessSpawner

class LocalProcessSpawnerEnv(jupyterhub.spawner.LocalProcessSpawner):
    async def start(self):
        startret = await super().start()
        subprocess.call(['/usr/libexec/setjupcgroup',str(self.pid)])
        return startret

/usr/libexec/setjupcgroup is

#!/bin/sh

export PATH="/usr/sbin:/sbin:/usr/bin:/bin:/usr/local/bin"

LOGIN=`awk '/^Uid:/{print $2}' /proc/$1/status`
MYPPID=`awk '/^PPid:/{print $2}' /proc/$1/status`

# simulate pam_systemd: start a session

mkdir -p /var/run/user/${LOGIN}
chown $LOGIN /var/run/user/${LOGIN}

# the session is actually created by logind. pam_systemd sends a DBus message to it
# requesting a new session. The return looks like 
# ('c32', objectpath '/org/freedesktop/login1/session/c32', '/run/user/1044', handle 0, uint32 1044, '', uint32 0, true)
# dbus-send is the documented interface, but it doesn't allow sending the kind of array needed for this
# operation. gdbus is somewhat more flexible, though undocumented
# for the arguments, see 
# gdbus introspect --system --dest org.freedesktop.login1 --object-path /org/freedesktop/login1
# also https://www.freedesktop.org/wiki/Software/systemd/logind/

RET=`/bin/gdbus call --system --dest=org.freedesktop.login1 --object-path=/org/freedesktop/login1 --method='org.freedesktop.login1.Manager.CreateSession' "$LOGIN" $1 jupyter unspecified user "" "" 0 "" "" false "" "" '[]'`
SESSIONID=`echo $RET| awk '{print substr($1,3, length($1)-4)}'`

# At this point we have a session. The rest of the code is to set memory limits and scheduling.
# Adjust to match your site's policies

systemctl set-property --runtime "session-${SESSIONID}.scope" CPUAccounting=yes

LIMIT=50331648K
# need to do this to create the slice and turn on memory accounting. systemd doesn't provide a way 
# to limit swap, so we follow this by setting limits directly. Setting a memory limit without a swap limit
# isn't helpful. It will just force jobs into swapping.
systemctl set-property --runtime "user-${LOGIN}.slice" MemoryLimit=$LIMIT
echo $LIMIT > /sys/fs/cgroup/memory/user.slice/user-${LOGIN}.slice/memory.memsw.limit_in_bytes   
echo $LIMIT > /sys/fs/cgroup/memory/user.slice/user-${LOGIN}.slice/memory.limit_in_bytes
# these two cat's may be unneeded. At one point it looked like you had to read the values for them to take effect
cat /sys/fs/cgroup/memory/user.slice/user-${LOGIN}.slice/memory.limit_in_bytes > /dev/null
cat /sys/fs/cgroup/memory/user.slice/user-${LOGIN}.slice/memory.memsw.limit_in_bytes > /dev/null
# tell fair-share scheduler to treat users equally
systemctl set-property --runtime "user-${LOGIN}.slice" CPUShares=100
clhedrick commented 5 years ago

On Debilski's comment: systemd-run would create a new unit. --slice says to put the unit in that user's slice. Logging out won't actually close the user slice. That happens automatically when there are no processes left in the slice. With ssh, pam_systemd creates a session within the user slice. At logout, pam_systemd is called by the pam session close. It does do a release session for the session, but that is for its session, not the user slice it's in. As far as I can see there's no explicit killing of the user slice. Rather, the slices all have "notify_on_release" set. That says that when the last process is removed from the slice, the release agent should be called. In this case it is set up to kill empty slices. That call is done by the kernel's cgroup mechanism. In short, systemd-run looks reasonable. However you'll need to supply some kind of hook so sites can set up properties for the new unit, as our script does. Creating a unit for each process doesn't do much good unless you can set up the properties that limit resource usage, set priorities, etc.

Debilski commented 5 years ago

@clhedrick in my tests, it did indeed remove the sd-run unit when a user’s ssh sessions were all closed. To reproduce it, I do the following:

# start a short-lived sd-run for the user (who is not logged in yet!) and add to the user’s slice
root@remote $ systemd-run --slice 'user-6001.slice' --uid 6001 /bin/bash -c 'sleep 20 ; logger normal shutdown'

# ssh from outside as that user and close the session immediately
me@local $ ssh user@remote
^D

# 'normal shutdown' will not have been logged

I can make it work by adding either --user or -p PAMName=login to the systemd-run command, neither of which is done by systemdspawner, but I don’t understand the whole process well enough to know if this is a good thing.

clhedrick commented 5 years ago

OK, I’ll try that tests and see if I can figure out what’s going on. The scripts I included do the right thing.

On Dec 7, 2018, at 6:30:49 AM, Rike-Benjamin Schuppner notifications@github.com<mailto:notifications@github.com> wrote:

@clhedrickhttps://github.com/clhedrick in my tests, it did indeed remove the sd-run unit when a user’s ssh sessions were all closed. To reproduce it, I do the following:

start a short-lived sd-run for the user (who is not logged in yet!) and add to the user’s slice

root@remote $ systemd-run --slice 'user-6001.slice' --uid 6001 /bin/bash -c 'sleep 20 ; logger normal shutdown'

ssh from outside as that user and close the session immediately

me@local $ ssh user@remote ^D

'normal shutdown' will not have been logged

I can make it work by adding either --user or -p PAMName=login to the systemd-run command, neither of which is done by systemdspawner, but I don’t understand the whole process well enough to know if this is a good thing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jupyterhub/systemdspawner/issues/32#issuecomment-445205361, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB0aCJdAY3TMeUWQNOjW2NU_PdPD59wfks5u2lFpgaJpZM4UpVSI.

clhedrick commented 5 years ago

I don’t get quite the behavior you get, but I believe this is system-specific.

Logind can be configured to end all processes in a session when the session ends. Centos 7 doesn’t do this by default. Suppose you login and leave a lot job running in the background, then log off. You don’t want that job killed. I conjecture that you’re using a system with that option enabled.

What I do see is that when the ssh ends, there’s a log message saying that user-NN.slice stopped. It does not, however, say that it is removed. That doesn’t happen until after the background job stops. With two ssh sessions, when the first one ends it doesn’t say that the user slice is either stopped or removed. So there is a difference.

It looks like at the logout logind thinks there’s no more use for that slice, and logs that it’s finished, but in fact keeps it open because there are still processes running. So I sort of replicate your behavior, except that my system keeps the slice active as long as there are processes in it and yours doesn’t.

systemd-run —user seems to register the session with your personal copy of systemd. Personal copies of systemd seem to be created by default on some systems but not others. On my system systend-run —user says "Failed to create bus connection: No such file or directory”, because Centos 7 doesn’t by default create a user systemd for ssh sessions.

On my system, -p PAMname=login is illegal. However if it works, this documentation probably explains what it does: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#PAMName=

It’s pretty evident that systemd-run is intended for services and not user sessions. You found that -p PAMname=login creates something that behaves like a user session. But that doesn’t work for Centos 7, so I don’t think we can depend upon it.

My script works on Centos 7, but probably wouldn’t work on your system, because the session shows as “abandoned.” It’s oK for me because my system is configured not to drop the user slice until all the processes are dead. It looks like your system isn’t configured that way.

The only truly safe way to create a user session with to call pam_systemd in the right context. My script mostly emulates pam_systemd, but there’s one thing it doesn’t do: pam_systemd creates a FIFO to logind. logind uses the FIFO to verify that the session is still alive. If all processes end, the FIFO goes away, and logind uses that to know that the session has ended.

If you do “lsof” of the top-level process in an ssh session, you’ll see the following: sshd 1706 root 12w FIFO 0,19 0t0 27820 /run/systemd/sessions/2.ref You won’t see that for the top-level process generated by systems-start. Hence it seems evident that systemd-run is not capable of creating an actual login session.

I believe if you’re going to use systemd-run you want to put the session somewhere other than user.slice. It seems evident from these tests that logind thinks it owns user.slice, and will kill user-NNN.slice if it thinks there are no sessions left for that user. This shouldn’t happen outside of user.slice. Alternatively, you’d need to have the top-level process create the FIFO, or simply call the actual pam-systemd.

clhedrick commented 5 years ago

I just looked at systemd spawner. I haven’t tried it, but it uses systemd-run in a way that I believe will start a service under system.slice. Because it doesn’t put in user.slice, the problems you and I have noted won’t be there. However loginctl won’t know about it, so things like terminate-session won’t work, though you can certainly stop it other ways.

On Dec 7, 2018, at 6:30 AM, Rike-Benjamin Schuppner notifications@github.com<mailto:notifications@github.com> wrote:

@clhedrickhttps://github.com/clhedrick in my tests, it did indeed remove the sd-run unit when a user’s ssh sessions were all closed. To reproduce it, I do the following:

start a short-lived sd-run for the user (who is not logged in yet!) and add to the user’s slice

root@remote $ systemd-run --slice 'user-6001.slice' --uid 6001 /bin/bash -c 'sleep 20 ; logger normal shutdown'

ssh from outside as that user and close the session immediately

me@local $ ssh user@remote ^D

'normal shutdown' will not have been logged

I can make it work by adding either --user or -p PAMName=login to the systemd-run command, neither of which is done by systemdspawner, but I don’t understand the whole process well enough to know if this is a good thing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jupyterhub/systemdspawner/issues/32#issuecomment-445205361, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB0aCJdAY3TMeUWQNOjW2NU_PdPD59wfks5u2lFpgaJpZM4UpVSI.

clhedrick commented 5 years ago

I think I have a solution.

Use sudospawner. Then configure the sudo pam file to call pam_systemd just for jobs started by jupyterhub. I’m doing the already for Zeppelin, so if sudospawner works the way it seems to be documented, this should be a clean solution.

I’ll try it next week.

On Dec 7, 2018, at 8:00:47 AM, hedrick@rutgers.edumailto:hedrick@rutgers.edu wrote:

OK, I’ll try that tests and see if I can figure out what’s going on. The scripts I included do the right thing.

On Dec 7, 2018, at 6:30:49 AM, Rike-Benjamin Schuppner notifications@github.com<mailto:notifications@github.com> wrote:

@clhedrickhttps://github.com/clhedrick in my tests, it did indeed remove the sd-run unit when a user’s ssh sessions were all closed. To reproduce it, I do the following:

start a short-lived sd-run for the user (who is not logged in yet!) and add to the user’s slice

root@remote $ systemd-run --slice 'user-6001.slice' --uid 6001 /bin/bash -c 'sleep 20 ; logger normal shutdown'

ssh from outside as that user and close the session immediately

me@local $ ssh user@remote ^D

'normal shutdown' will not have been logged

I can make it work by adding either --user or -p PAMName=login to the systemd-run command, neither of which is done by systemdspawner, but I don’t understand the whole process well enough to know if this is a good thing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jupyterhub/systemdspawner/issues/32#issuecomment-445205361, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB0aCJdAY3TMeUWQNOjW2NU_PdPD59wfks5u2lFpgaJpZM4UpVSI.

Debilski commented 5 years ago

Thanks @clhedrick for the hint with using sudospawner instead. I am only now catching up with the ideas in here. Could you explain a bit more, how I would have to edit the pam file?

clhedrick commented 5 years ago

I'm going to show how I did it for Zeppelin. It is going to require some adjustment for Jupyterhub. The grep line at the end checks whether this is the sudo command that Zeppelin uses. Jupyterhub will use a slightly different one. You can look in /var/log/secure or put a print statement in the script for $CMD.

/etc/pam.d/sudo
#%PAM-1.0
auth       include      system-auth
account    include      system-auth
password   include      system-auth
session    optional     pam_keyinit.so revoke
session    required     pam_limits.so
session    [default=2 success=ignore] pam_exec.so quiet /usr/libexec/sudozeppelin
-session   optional     pam_systemd.so
session    optional     pam_exec.so /usr/libexec/setlimits.sh 

\Note that pam_systemd is only called for zeppelin sessions. The script /usr/libexec/sudozeppelin only succeeds for them. But it has to do other stuff as well:

#!/bin/sh

#printenv >> /tmp/sudozep
#echo pid $$ >> /tmp/sudozep

# This is intended to be called from pam.d/sudo session.
# It checks whether this is a sudo being done by zeppelin to start a
#   user process. If so, it succeeds. Otherwise it fails.
# That lets pam call pam_systemd just for zeppelin user jobs.
# Normally sudo doesn't want to create a new session, but for
#   Zeppelin user jobs we do.
# Put the sudo process's PID into the root cgroups. That removes
#   them from the session they're currently in. pam_systemd won't
#   start a new session if a process is already in a session, so
#   this is needed for pam_systemd to do anything.

if test "$PAM_RUSER" = "zeppelin" -a "$PAM_USER" \!= "root" -a "$PAM_TYPE" = "open_session"; 
then
  MYPPID=`awk '/^PPid:/{print $2}' /proc/$$/status`
  CMD=`cat /proc/$MYPPID/cmdline`
  if echo "$CMD" | grep -q "sudo.*-H.*-u.*source /usr/hdp/current/zeppelin-server" ; then
     echo $MYPPID >/sys/fs/cgroup/systemd/cgroup.procs 
     echo $MYPPID >/sys/fs/cgroup/memory/cgroup.procs 
     echo $MYPPID >/sys/fs/cgroup/cpu,cpuacct/cgroup.procs 
     exit 0
  fi
fi

exit 1
clhedrick commented 5 years ago

In my opinion this is not something we want sysadmins to have to do. I believe it would be better for Jupyterhub to do this itself. But that's moderately tricky.

Debilski commented 5 years ago

Thanks a lot for this. I’m going to experiment with it. But I agree, I’d prefer not having to implement this.

In the meanwhile though, I figured that the current version of systemdspawner has a unit_extra_properties parameter, which the version my initial test were based on didn’t have. Now, I can pass

c.SystemdSpawner.unit_extra_properties = {'PAMName': 'login'}

and it seems to do proper accounting and user slice creation on my system (Debian 9). I’ll have to play with it a little more though to figure out some more edge cases.

clhedrick commented 5 years ago

debian and centos appear t have differences in how the slices are set up. I’d prefer a system-independent solution, but you’re probably best with a simpler one that will work for your system.

On Feb 12, 2019, at 12:59 PM, Rike-Benjamin Schuppner notifications@github.com<mailto:notifications@github.com> wrote:

Thanks a lot for this. I’m going to experiment with it. But I agree, I’d prefer not having to implement this.

In the meanwhile though, I figured that the current version of systemdspawner has a unit_extra_properties parameter, which the version my initial test were based on didn’t have. Now, I can pass

c.SystemdSpawner.unit_extra_properties = {'PAMName': 'login'}

and it seems to do proper accounting and user slice creation on my system (Debian 9). I’ll have to play with it a little more though to figure out some more edge cases.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jupyterhub/systemdspawner/issues/32#issuecomment-462866838, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB0aCGbCYyXxWpw2LjEkVZAEOrpSRrrGks5vMwEWgaJpZM4UpVSI.

consideRatio commented 1 year ago