Improve git credential handling in Kubernetes

sgaist commented 2 years ago

The git credentials are now stored in a secret following recommended best practice.

The secret will have the same lifespan as the builder pod.

sgaist commented 2 years ago

One note: we might also want to consider making the pod owner of the secret which would automatically trigger its deletion along side the pod.

However, if for some reason the pod fails to be created the secret would remain so there's a need to clean it up manually in that case.

sgaist commented 2 years ago

@consideRatio Thanks for your suggestions. To answer your questions/remarks:

Currently the credentials are static by provider but my implementation was done so on purpose: it takes into account #1169 (Allow dynamic repository credentials for authenticated Binderhub instances) that I am interested in as well.

A chart managed secret / a dynamically created secret?

If we take the static credential use case, having one single chart managed secret is a good idea. The git_credentials property could be nuked in favor of setting the key matching the repo provider.

If we take the #1169 use case into account, we would need to ensure that we cleanup the content of the secret to avoid keeping tokens (and a growing secret) for more time than necessary (i.e. user session length + builder lifetime).

From the top of my head, this could be done in the following way:

Start build
Update secret with token using pod name as key
Run build
Update secret and remove the token

The thing we need to determine now is how to differentiate a static credential VS a user provided one.

Single / Multiple k8s Secrets

In the static case, I think one single secret would make sense.

For the #1169 use case, it's a different story. We would likely need to benchmark things a bit. Kubernetes secrets cannot be larger than 1MIB but one should also not create too many little Secrets in a given namespace as it could also exhausts memory. The documentation also suggest to use resource quotas in order to limit their number.

consideRatio commented 2 years ago

Ah okay!

This fix becomes more complicated if planning for #1169 at the same time, to support runtime decided git credentials, but we can go for it if you want. I'd appreciate if another maintainer could opinie about the questions below though, as I'm not comfortable approve/merging this without further agreement otherwise.

Question 1: Is there agreement to add support for runtime decided git credentials?

Doing it imples managing k8s Secrets via the binderhub software itself, as compared to the BinderHub Helm chart.

Question 2: Should we support only runtime decided git credentials, or a mix?

If using binderhub software managed k8s Secrets by default, it means that we take some performance hit as pod must wait for the k8s Secret creation, which follows the pod creation in sequence, and burden the k8s api-server a bit by additional requests. It may be miniscule though. So, if we switch to this where its not used, this is a compromise even though it may possibly be very small. At the same time, if we start supporting both options with a single Helm chart managed k8s Secret and runtime created k8s Secrets, we increase the complexity of configuration/documentation and the complexity of the code base as a whole.

Implementation criterias for runtime decided git credentials

Use ownerReferences This seems like the only robust way to ensure that we cleanup secondary resources without needing to track another type of k8s resource. There is an implementation for this in kubespawner already, there may be room for improvements in how its implemented but a key implementation detail is that the k8s Secret should be created after the Pod.
Use of typical labels for created resources Any resources created should have a few labels to indicate who created it. Helm charts creating resources should have labels like these, where its not obvious what we should set when its resources created by software, which in turn was installed by a Helm chart. I'm not 100% on what values we provide, or if we should set the helm.sh/chart label as its indirectly created only, but we should at least provide some labels indicating this origin I think.

consideRatio commented 2 years ago

My own response to the questions above:

Q1: I think allowing for runtime decided git credentials is acceptable to support, that the complexity added is sufficiently motivated, but I'm not fully confident about this.
Q2: I think I'd rather see a single flexible approach that then must be found acceptable for everyone, even for those not using this, as that would reduce the complexity of the system as a whole. I'm worried otherwise that this strategy is used by very few, and we need to ask "oh are you using ... or not?" and giving special treatments to this when testing and trying to come to the botton of a reported issue etc.

@minrk you worked to provide the kubespawner implementation dynamically creating k8s Secret resources for internal_ssl, what do you think?

sgaist commented 2 years ago

On the implementation side, the secret can be created before the pod and the ownership applied through patching. I did a small test locally. It can be done as soon as the pod enters the pending state.

I haven't benchmarked it but it would likely reduce the performance hit of having the pod waiting on the secret presence to be properly started.

The downside is that if the pod creation fails the secret needs to be manually cleaned up.

consideRatio commented 2 years ago

The downside is that if the pod creation fails the secret needs to be manually cleaned up.

Yepp, a quite significant downside i think. I'd rather make sure we dont emit credentials we fail to clean up than optimize performance to this degree - based on guessed impact scale.

sgaist commented 1 year ago

@consideRatio I was wondering one thing: should the secret creation code be put in its own function ?

manics commented 1 year ago

Note the git credential is also needed by BinderHub before the build pod is created, since BinderHub checks the registry for an existing image so the Git SHA is required: https://github.com/jupyterhub/binderhub/blob/8351995380a48ab1da096e02b2a1b93aab4bbc77/binderhub/builder.py#L269-L297

This may complicated things if implementing dynamic per-user/repo Git credentials. I've hit a similar issue whilst trying to disentangle another property: https://github.com/jupyterhub/binderhub/issues/1594

If we go down the route of dynamic docker config it would make sense to keep these two aligned, so if you have any thoughts on that issue please share them!

manics commented 1 year ago

I've been thinking about this because I want to get temporary registry credentials into the build image to push to ECR (push tokens are time-limited so a static secret won't work https://github.com/jupyterhub/binderhub/issues/1623).

Using secrets is definitely best practice, but on balance I think managing dynamic secrets overcomplicates things for BinderHub due to the clean-up issue- it's a shame you can't atomically create a secret+pod. ownerReferences may help, but there's a circular dependency between the pod and secret and therefore a race condition.

In the worst case the implementation in https://github.com/jupyterhub/kubespawner/blob/0f22abf096459b5424325fa93ff4b30bef568391/kubespawner/spawner.py#L2714-L2732 will lead to N orphaned secrets where N = number of users, whereas in Binderhub the secrets could be per-user-per-repo so there's an unlimited number.

If we want to pursue this I think a compromise of using a Helm chart managed secret for global credentials, and plain env-vars for dynamic credentials, is reasonable. It requires additional logic in the build class to choose between mounting the global secret vs ignoring it and setting an environment variable, but we'll need additional logic to support dynamic credentials anyway. The small risk of using env vars can be mitigated by ensuring any dynamic credentials are time limited- they most likely will be anyway, e.g. OAuth tokens expire, as do ECR registry credentials. You could probably make the case that a dynamic expiring env var is more secure than a one year old static secret :smiley:

Finally if designed correctly this will be an implementation detail so there's nothing stopping us from switching in future.

sgaist commented 1 year ago

Related to your post from past December (that I was sure I answered to so sorry for the delay): Where you writing in the context of pushing the image back to the same repository as the code comes from ? Otherwise, it's one unique token that could be stored as a static secret managed by helm.

For your new setup, it sounds a bit like the forge use case. In between, one related thing I have discovered that could be of interest: the Secrets Store CSI driver. Note that I haven't tested it yet. The goal of this project is to move the secrets into a dedicated strong storage and the pods use a dedicated provider to retrieve them. There's an AWS provider however I currently don't know if it would be usable for your use case. While interesting, it adds a new layer of indirection which needs to be evaluated first.

As for the orphaned secrets, I am wondering whether having a cleanup CronJob would help in that case ? Note that the current implementation should properly clean everything in case of failure.

jupyterhub / binderhub