yuvipanda / jupyterhub-ssh

SSH Access to JupyterHubs
BSD 3-Clause "New" or "Revised" License
93 stars 29 forks source link

Add jupyterhub-sftp #10

Closed yuvipanda closed 3 years ago

yuvipanda commented 3 years ago

Talking to users, SFTP / SCP was cited as the biggest usecase they see for ssh. We could theoretically implement it with asyncssh, but we don't have a clear way to support all the operations SFTP wants with just the Notebook Contents API - there's no read at / write at / open functionality. Besides, the folks who want to use SFTP want to use it for high performance file transfer, and that would be a bit of work.

Instead, for now, we add an SFTP setup with OpenSSH directly. SFTP doesn't require or allow arbitrary code execution, so we don't need to force the environment to match what the JupyterHub provides. We just need to match the home directories.

In this case, we only support NFS based home directories. This isn't as agnostic as jupyterhub-ssh, but it works in the following use cases:

  1. All user files are owned by the same UID (currently hardcoded to 1000).
  2. The NFS share containing them can be mounted in the pod serving SFTP
  3. JupyterHub token authentication will be used to authenticate users
  4. You can map from JupyterHub user name to NFS path easily and consistently

In these cases, it works fairly well!

OpenSSH doesn't really support pluggable auth, mostly expecting system users to exist. This is in two parts:

  1. The username exists and is mapped to a given id, aquired via standard Linux mechanisms (getent)
  2. The password is valid and authenticated via standard Linux mechanisms (PAM)

We do (1) with libnss-ato, and (2) with pam_exec. These help us pretend that each JupyterHub user exists (with uid 1000), and their password is a valid JupyterHub token! This probably has uses beyond this, an exercise for another day.

We will re-use the key from jupyterhub-ssh, but operate on a different port. Traefik can still route us TCP traffic, so all good!

FWIW, I also spent some time on proftpd, which has an SFTP module. I was side-tracked by the ! SFTP command (pops a local shell, I thought it popped a remote shell!), but stuck to OpenSSH regardless - it's more battle tested and easier to configure.


#