Talking to users, SFTP / SCP was cited as the biggest usecase
they see for ssh. We could theoretically implement it with
asyncssh, but we don't have a clear way to support all the
operations SFTP wants with just the Notebook Contents API -
there's no read at / write at / open functionality. Besides,
the folks who want to use SFTP want to use it for high
performance file transfer, and that would be a bit of work.
Instead, for now, we add an SFTP setup with OpenSSH directly.
SFTP doesn't require or allow arbitrary code execution, so
we don't need to force the environment to match what the
JupyterHub provides. We just need to match the home
directories.
In this case, we only support NFS based home directories.
This isn't as agnostic as jupyterhub-ssh, but it works
in the following use cases:
All user files are owned by the same UID (currently
hardcoded to 1000).
The NFS share containing them can be mounted in the
pod serving SFTP
JupyterHub token authentication will be used to
authenticate users
You can map from JupyterHub user name to NFS path
easily and consistently
In these cases, it works fairly well!
OpenSSH doesn't really support pluggable auth, mostly
expecting system users to exist. This is in two
parts:
The username exists and is mapped to a given id,
aquired via standard Linux mechanisms (getent)
The password is valid and authenticated via
standard Linux mechanisms (PAM)
We do (1) with libnss-ato,
and (2) with pam_exec. These
help us pretend that each JupyterHub user exists (with uid 1000),
and their password is a valid JupyterHub token! This probably
has uses beyond this, an exercise for another day.
We will re-use the key from jupyterhub-ssh, but operate on a different
port. Traefik can still route us TCP traffic, so all good!
FWIW, I also spent some time on proftpd, which has an SFTP module.
I was side-tracked by the ! SFTP command (pops a local shell,
I thought it popped a remote shell!), but stuck to OpenSSH
regardless - it's more battle tested and easier to configure.
Talking to users, SFTP / SCP was cited as the biggest usecase they see for ssh. We could theoretically implement it with asyncssh, but we don't have a clear way to support all the operations SFTP wants with just the Notebook Contents API - there's no read at / write at / open functionality. Besides, the folks who want to use SFTP want to use it for high performance file transfer, and that would be a bit of work.
Instead, for now, we add an SFTP setup with OpenSSH directly. SFTP doesn't require or allow arbitrary code execution, so we don't need to force the environment to match what the JupyterHub provides. We just need to match the home directories.
In this case, we only support NFS based home directories. This isn't as agnostic as jupyterhub-ssh, but it works in the following use cases:
In these cases, it works fairly well!
OpenSSH doesn't really support pluggable auth, mostly expecting system users to exist. This is in two parts:
getent
)We do (1) with libnss-ato, and (2) with pam_exec. These help us pretend that each JupyterHub user exists (with uid 1000), and their password is a valid JupyterHub token! This probably has uses beyond this, an exercise for another day.
We will re-use the key from jupyterhub-ssh, but operate on a different port. Traefik can still route us TCP traffic, so all good!
FWIW, I also spent some time on proftpd, which has an SFTP module. I was side-tracked by the
!
SFTP command (pops a local shell, I thought it popped a remote shell!), but stuck to OpenSSH regardless - it's more battle tested and easier to configure.#