Closed leolb-aphp closed 4 years ago
A good document to keep close when discussing security of operating a JupyterHub is https://jupyterhub.readthedocs.io/en/stable/reference/websecurity.html. It also lays out what kind of trust is placed in users (or not). I find the document useful because it lays out some of the constraints/assumptions about the use cases for which JupyterHub is designed.
Have you used gVisor? I've only looked at the docs and have it on the list of things to look into for mybinder.org. It sounds like the change to using gVisor would be in how a user setups up their infrastructure, not in code in JupyterHub. Is that understanding correct?
I recently came across https://github.com/elotl/kip:
Kip is a Virtual Kubelet provider that allows a Kubernetes cluster to transparently launch pods onto their own cloud instances.
Z2JH has support for node selectors, so based on the kip readme it sounds like it might work without any Z2JH changes.
@manics Kip seems to have severe limitations that makes it fit to a limited set of use cases. https://github.com/elotl/kip#limitations
@betatim Not yet, trying out Kata Containers right now (setting up the infrastructure).
For Kata Containers to work best, JupyterHub's Helm Chart would need to support setting user pods's Runtime Class. - Otherwise, user pods's annotations and a node selector could do it. However, this helm chart does not seem to support setting annotations to user pods yet.
Also looking at things such as: https://github.com/rootless-containers/usernetes to more broadly mitigate security risks of running arbitrary untrusted code from random users and their possibly compromised personal environments that contain their credentials to JupyterHub.
It seems that Kata Containers does not support container-to-container shared network namespaces, not sure if it has impact here, could not test yet.
Using Virtualization seems too far off the classical container model to remain broadly compatible. I think gVisor is a better bet than Kata Containers, at least right now. Otherwise, there's movement towards using the CPU's MMU to isolate kernel data structures used for different containers but big challenges remain in terms of refactoring the Linux kernel to handle data structures differently.
See: https://fosdem.org/2020/schedule/event/kernel_address_space_isolation/
@leolb-aphp What do you think about moving this discussion to the community discourse forum? This is a very interesting topic and I'm following along, but arguably also goes beyond the scope of this project. If it's on the forum more people would see the topic, so there's a chance people will come up with more suggestions.
@manics Sure, well though I think it is particularly relevant for JupyterHub because it is one case where Kubernetes is used to run untrusted code from third parties, unlike most of the time.
We want to use the issues on GitHub repos for technical discussions on how to change the contents of the repo. More general discussions should go to discourse so that they are easier to find, better indexed by google and generally more accessible than being hidden in the bowls of a GitHub repository. Once the discussion becomes more concrete in terms of which technology to use we can start an issue to discuss the work needed to make it happen. Right now this issue is more about evaluating options, seeing what is out there, etc).
Thanks!
@choldgraf @betatim Hello, is it possible to be whitelisted? "Sorry, new users can only put 2 links in a post." I have the same username on the forum.
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:
https://discourse.jupyter.org/t/better-security-isolation-mechanisms-for-untrusted-users/4245/1
You can introduce yourself in https://discourse.jupyter.org/t/introduce-yourself/17/168. The restriction reduce after two posts I think.
@betatim Posted twice but still not the case, I think it might be better to enforce say, captchas, for each new post at first, and lift that after a little while. Restricting amount of links is almost artificial to spam prevention, a spammer doesnt need more than 2 links in a post. And my Github account with which I registered on the forum, testifies that I'm no spammer.
hey @leolb-aphp - I've upped your trust level so you should be able to post now, let me know if that doesn't work
hey @leolb-aphp - I've upped your trust level so you should be able to post now, let me know if that doesn't work
@choldgraf It worked, thanks a lot!
Kip seems to have severe limitations that makes it fit to a limited set of use cases.
Hi @leolb-aphp , i am founder of Elotl (makers of kip). Am curious to learn if:
a) you tried Z2JH with kip and found the current kip limitations a blocker for your JupyterHub usecase. If so, could you please share which limitations are blockers? We at kip would love to fix any blocker issues hence asking.
or
b) you did not try Z2JH with kip but predict the current limitations of kip as a blocker. In this case, can you please share which limitations do you think are blockers?
We at kip would love to support Z2JH for kubernetes. i would love to give it a try to find any potential issues, but want to learn from your experience before heading down that route, hence asking.
Thanks a lot! madhuri.
Hello!
JupyterHub uses regular Kubernetes containers or pods to run code from most certainly untrusted third parties. If they can be trusted to not be malicious themselves, which is not true in all cases, they certainly cannot be trusted to not make mistakes or be compromised.
For this reason, and looking at the use case JupyterHub is making of Kubernetes, I suggest that compatibility with more secure container runtimes such as gVisor (experimental) or Kata Containers (used in production, but still maturing) be studied.
I have already been studying both solutions and currently still doing so to figure out how well they can integrate with this helm chart.
I figured using VMs for running JupyterHub's user workloads is more appropriate security-wise. This is not the repository for this but I think making JupyterHub talk to an OpenStack API or similar for provisioning VMs is a good idea. The latter also would integrate better with NVIDIA GPUs and their vGPU technology which works reliably only with VMs right now.
Thanks