jupyterhub / binderhub

Run your code in the cloud, with technology so advanced, it feels like magic!
https://binderhub.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.58k stars 390 forks source link

`buildDockerConfig` is broken, `binder-build-docker-config` has multiple uses #1594

Open manics opened 1 year ago

manics commented 1 year ago

This is a follow-up from https://github.com/jupyterhub/binderhub/pull/1592

c.BinderHub.build_docker_config isn't used by the BinderHub Python app, only it's presence is checked:

$ git grep build_docker_config origin/main

origin/main:binderhub/app.py:    build_docker_config = Dict(
origin/main:binderhub/app.py:                "build_docker_config": self.build_docker_config,
origin/main:binderhub/builder.py:        if self.settings["use_registry"] or self.settings["build_docker_config"]:

Instead it's used in the Helm Chart secret: https://github.com/jupyterhub/binderhub/blob/2ba938bf20201ff864b3a6c473f66f30fbef92bc/helm-chart/binderhub/templates/secret.yaml#L34-L42 but the secret refers to config.BinderHub.buildDockerConfig instead of config.BinderHub.build_docker_config.

Furthermore from @consideRatio https://github.com/jupyterhub/binderhub/blob/2ba938bf20201ff864b3a6c473f66f30fbef92bc/helm-chart/binderhub/templates/_helpers.tpl#L42-L45

This segment seems like a bug, or it is at least a notable issue because its simply separate from providing config.BinderHub.build_docker_config in values. Whats under .config should be pure passthrough - so buildDockerConfig won't map to build_docker_config.

This was originally added in https://github.com/jupyterhub/binderhub/pull/1255

Aside from it being broken, this is now a problem in https://github.com/jupyterhub/binderhub/pull/1521 where the Kubernetes config is being decoupled from the main BinderHub application, and there's a two-way dependency between BinderHub and the Helm Chart:

A few potential options

Move buildDockerConfig to the top level of Values (.Values.buildDockerConfig), the Helm Chart has full ownership

This is the simplest option, and it also means BinderHub can use the same secret to query the registry. However if we want to have a variable Docker build config on a per-user or per-repo basis (see also https://github.com/jupyterhub/binderhub/pull/1169, https://github.com/jupyterhub/binderhub/pull/1577) this is not a good option.

Move buildDockerConfig to KubernetesExecutor.build_docker_config, and configure it with config.KubernetesExecutor.build_docker_config, The Helm chart creates a secret from this config

This is close to the current implementation, except that the property is owned by KubernetesExecutor instead of BinderHub. The secret is still created by the Helm Chart, and it also suffers the same limitations if we want per-user/per-repo config. It also means the BinderHub registry check depends on a KubernetesExecutor property.

Move buildDockerConfig to KubernetesExecutor.build_docker_config, KubernetesExecutor has full ownership including creating the secret

This is the most flexible option, and would allow us to have a variable Docker build config on a per-user or per-build basis in future. This is similar to what's being done in https://github.com/jupyterhub/binderhub/pull/1577 . However it means we need another way for BinderHub to check the registry. Either a second, possibly duplicate secret, though this could be a read-only registry token if you wanted to be really strict about security. Or move all registry/docker configuration to the Registry class, and have KubernetesExecutor depend on the Registry.

The third option is the most flexible, but we could also go with the first option and make a new breaking change when we're ready to introduce per-user/per-repo config.

sgaist commented 1 year ago

While I like option number three best. I'd say we should evaluate the time frame for having dynamic credential support merged and what it requires for them to be used "everywhere".

AFAIK, there are three different use cases: 1) Use of credentials for Git related work 2) Use of credentials for registry access 3) In the case of forges like GitLab push back to the same project as the code cloned

This also requires to decide how to configure whether static or dynamic credentials are going to be used when deploying and for which part of BinderHub it applies.

On the point of handling credentials, I think it's also important to document somewhere how to retrieve them. If memory serves well, It's not just a question of using e.g. GitLab for the login part. There's also configuration to be done in order for it to provide actual information in the auth_state.

I remember starting doing some work on number three last year however I can't put my hands on it now. From memory, it was not overly complicated. I think the main issue I had was to configure BinderHub's authentication to retrieve the token once logged in.


Found something for the documentation part, it's in oauthenticator: https://github.com/jupyterhub/oauthenticator/issues/372