jupyterhub / binderhub

Run your code in the cloud, with technology so advanced, it feels like magic!
https://binderhub.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.53k stars 386 forks source link

Group ID set to 0 when running on mybinder.org despite container wanting 100 #1088

Open JoshKarpel opened 4 years ago

JoshKarpel commented 4 years ago

Bug description

Inside the notebook environment on mybinder.org, the user's primary group id is set to 0 regardless of what is in the container metadata.

I'm not sure if this is a BinderHub problem or a mybinder.org problem (I imagine the real problem may be even lower in the stack - the KubeSpawner or something).

I believe this broke sometime in the last month - before then, the result of id was as-expected.

Expected behaviour

The user's primary group id is set by the Docker container metadata. For example, in a Jupyter Docker Stack image running on my desktop, the result of running id in the terminal is

jovyan@521a9e575d14:~$ id
uid=1000(jovyan) gid=100(users) groups=100(users)

Actual behaviour

The user's primary group id is always set to 0.

jovyan@jupyter-binder-2dexamples-2djupyter-2dstacks-2dshd85tff:~$ id
uid=1000(jovyan) gid=0(root) groups=0(root),100(users)

How to reproduce

  1. Run something like https://github.com/binder-examples/requirements (not based on Jupyter Docker Stacks) or https://github.com/binder-examples/jupyter-stacks (where I originally had the issue) on mybinder.org
  2. Open the terminal
  3. Run id

Your personal set up

I'm using mybinder.org with no special modifications. I have not tested on any other BinderHubs.

betatim commented 4 years ago

@manics quite a while ago we made a change to the adduser command in repo2docker. I don't remember why but do you think that is a place to investigate?

@consideRatio can you think of a change in kubespawner or k8s itself that would have this effect?

I think I did upgrade the kubernetes master version on GKE prod a while ago (maybe consistent with "a month ago"), but I think it was just a minor version upgrade.

manics commented 4 years ago

The main post says it also occurs with https://github.com/binder-examples/jupyter-stacks which bypasses repo2docker.

I don't recall any relevant changes to KubeSpawner but I'll defer to @consideRatio.

Is it possible this is an intentional override on mybinder.org?

manics commented 4 years ago

This old issue could be relevant: https://github.com/jupyterhub/kubespawner/issues/290

betatim commented 4 years ago

I ran `` to see what we can see in terms of which feature flags are enabled on GKE prod. This is one of the lines found:

"exec kube-proxy --master=https://35.202.156.97 --kubeconfig=/var/lib/kube-proxy/kubeconfig --cluster-cidr=10.12.0.0/14 --resource-container=\"\" --oom-score-adj=-998 --v=2 --feature-gates=DynamicKubeletConfig=false,TaintBasedEvictions=false,RotateKubeletServerCertificate=true,ExperimentalCriticalPodAnnotation=true --iptables-sync-period=1m --iptables-min-sync-period=10s --ipvs-sync-period=1m --ipvs-min-sync-period=10s 1\u003e\u003e/var/log/kube-proxy.log 2\u003e\u00261"

I think this means the RunAsGroup feature gate is set to its default value for v1.14.10-gke.17. Looking through https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.14.md shows that RunAsGroup got enabled by default for 1.14. I couldn't find a mention of the feature gate in the changelog for GKE :-/

I think this means the analysis in https://github.com/jupyterhub/kubespawner/issues/290 applies and we should start setting runAsUser and runAsGgroup in the BinderHub chart.

JoshKarpel commented 4 years ago

I think this means the analysis in jupyterhub/kubespawner#290 applies and we should start setting runAsUser and runAsGgroup in the BinderHub chart.

@betatim Great find! If this is indeed the solution, any estimate on when it could be implemented?

betatim commented 4 years ago

any estimate on when it could be implemented?

How quickly can you send a PR? ;)

JoshKarpel commented 4 years ago

;) I'm not familiar with the BinderHub chart, nor Kubernetes in general... is this https://github.com/jupyterhub/binderhub/blob/089702b32dc0689ba432504724d82bb42ad2a94f/helm-chart/binderhub/values.yaml#L78 the right place to make config changes, like https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/advanced.html#hub-extraconfig ?

betatim commented 4 years ago

Without looking into it, I think implementing it in https://github.com/jupyterhub/kubespawner/ would be the place to implement this as this will effect not only BinderHub uses cases but all JupyterHub-on-kubernetes

JoshKarpel commented 4 years ago

Take the following with a grain of salt! I am not a k8s expert, and I'd be happy to be proven wrong :)

The situation described in https://github.com/jupyterhub/kubespawner/issues/290 does indeed appear to be exactly what we're hitting: when you run a pod without runAsGroup set in the security context but with the RunAsGroup feature gate set to true, the pod is started with group ID 0 no matter what is in the container. This is the default since Kubernetes v1.14.

I took a look at the security context documentation. As far as I can tell, if feature gate RunAsGroup=true, there is no way to tell Kubernetes to run the container as the group in the container metadata. If the feature gate is false and you don't pass runAsGroup in the security context, you get the old default behavior, where the group is taken from the image metadata. This is only way to get that behavior.

That means the problem can't be fixed in the KubeSpawner, because the KubeSpawner is being broken by the default behavior of Kubernetes changing under it. There's no way for the spawner to uniformly know what uid/gid to start the container as; it either has to be done in config or deferred to the image (unless the KubeSpawner pulls the image itself and inspects it, which I doubt they want to do). Basically, passing run_as_group = None to the Kubernetes API changed meaning. The only other option is to explicitly pass an integer.

Even worse, I'm not sure that the feature flag can be changed from the Helm chart. The docs indicate that you need to set feature gates on the command line - I'm not sure if the Helm chart gets to modify that, because Helm configuration happens after the Kubernetes cluster is already up. BinderHub admins would need to adjust this setting for their specific use case; it seems like most BinderHubs would want to turn the feature gate off.

For the moment, I think I have a fix that will let my use case work even when the group id is set to 0. Let me know if you think this issue is worth escalating, given the above. I'd be happy to at least draft some documentation describing the problem and possible solutions.