kubeflow / notebooks

Kubeflow Notebooks lets you run web-based development environments on your Kubernetes cluster by running them inside Pods.
Apache License 2.0
13 stars 7 forks source link

Allow SSH access into Notebook Pods #23

Open thesuperzapper opened 1 month ago

thesuperzapper commented 1 month ago

Whats the Goal?

I am trying to figure out how to allow users to SSH into Notebook Pods from their laptop. The benefit of this is supporting tools like Remote VSCode and JetBrains Gateway (for PyCharm) with the resources (e.g. GPUs) of the Pod.

The main issue is how to expose the Notebook Pod via SSH on the Istio Ingress Gateway.

What's the Problem?

SSH uses TCP which can't do hostname/HTTP-path routing like we do for the web-based UIs of the Notebooks. The naive approach is to have the Istio Ingress Gateway listen on a unique port for each Notebook (which is obviously not scalable or secure).

In my mind there are only TWO ways to make this work:

  1. Use a "jump box" service (which has a single IP/Port) which listen on SSH, but route incoming requests to specific Notebooks Pods based on the SSH-key used to authenticate:
    • This could be implemented by setting the command in authorized_keys to another -t username@<WORKSPACE_NAME>.<NAMESPACE_NAME>.svc.cluster.local] command (see idea here)
    • Or possibly there might be a pre-made opensource ssh-routing tool for this exact use-case.
    • I am not sure what kind of hardening is required on the jump box, but we need to consider stuff like:
      • disabling ssh tunneling
      • ensuring only traffic from the Istio Gateway gets to it (not from Pods inside the mesh)
      • using fail2ban to stop brute forcing
      • regular/automatic rotation of SSH keys
  2. Using some kind of SD-WAN VPN like Tailscale (can be open source hosted), Cloudflare Tunnel, or ngrok:
    • We would run the service both on the laptop and notebook pod, giving the Notebook Pod a special IP address that the laptop can use to access it.
    • This is slightly problematic because it will not be a direct connection from the user to the Pod (and it will probably be slower because traffic might have to be relayed).

Other Notes

While it is technically possible to use kubectl port-forward on the laptop to expose any port that the Notebook Pod is listening on (e.g. SSH port), I am not sure this is desirable at scale because it requires all users to have the pod/exec RBAC on the profile namespace, which is very privileged.

Final Thoughts

There are lots of security considerations to allowing remote SSH access, especially for the people who put Kubeflow on the public internet (NOT advised).

I am interested to hear people's ideas for how we can do this safely.

thesuperzapper commented 1 month ago

@kimwnasptd @jiridanek @ederign @juliusvonkohout I am interested to know your thoughts on this, as allowing SSH into Notebook Pods is a long standing request, but is complex to do safely.

juliusvonkohout commented 1 month ago

In practice most people i know use a local vscode and connect it to the workbench/workspace via vscode extensions and a Kubeconfig. So it works on the Kubernetes, not the Kubeflow layer.

thesuperzapper commented 1 month ago

In practice most people i know use a local vscode and connect it to the workbench/workspace via vscode extensions and a Kubeconfig. So it works on the Kubernetes, not the Kubeflow layer.

@juliusvonkohout I assume you are talking about the "attach to container" feature:

Or are you talking about using the code tunnel CLI, which relays through Microsoft servers?


If you are talking about the first option, it still has a few problems:

  1. It only supports VSCode
  2. It requires users to have lots of permissions on the cluster (I would need to check exactly what kubectl permissions, but I imagine they at least need pod/exec)
  3. The licence of VSCode Remote is proprietary (this is less of a problem, but I am just raising it)

Hence why I want to figure out a generic solution for SSH into the Notebook Pods without compromising the security of the cluster.

juliusvonkohout commented 1 month ago

"Hence why I want to figure out a generic solution for SSH into the Notebook Pods without compromising the security of the cluster." yes, Code-server/vscode is just a workraound

ederign commented 1 month ago

@thesuperzapper This is indeed an interesting feature that can open up a bunch of new use cases and I agree that we should be careful on security considerations.

I would also start exploring option 1 (jump-box).

One thing we need to figure out is how users will securely add their own SSH keys. The first approach that comes to my mind is to allow them to assign a public key to a given notebook on the spawner UI. Another approach would be a 'key per namespace', that will allow me to ssh in any notebook of a given namespace.

Kallepan commented 3 weeks ago

Is relying on the Kubernetes API a scalable and reliable solution for managing workloads? I'm concerned that the kube-apiserver could become a bottleneck if multiple users simultaneously access numerous pods, particularly given that tools like VSCode may generate a high volume of small requests.