jupyterhub / binderhub

Run your code in the cloud, with technology so advanced, it feels like magic!
https://binderhub.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.53k stars 386 forks source link

Annotations on binderhub pods #1063

Open eexwhyzee opened 4 years ago

eexwhyzee commented 4 years ago

I was wondering if there was a way to add a k8s annotation on a binderhub pod that contains just the repo name? So far I'm able to add the annotation using extraConfig and KubeSpawner:

hub:
    extraConfig:
     myConfig: |
        c.KubeSpawner.extra_annotations= {
          "repo": '{username}'
        }

but this sets the annotation to the escaped pod name with the unique ID on it (I just want the repo name itself), I tried to manipulate the {username} string (striping the escape characters and unique ID) in the extraConfig too but it doesn't seem to work......which I think is because the python code in extraConfig executes before the pod actually gets spawned?

betatim commented 4 years ago

Off the top of my head this is a bit tricky.

BinderHub asks its JupyterHub to launch the pod. It uses the JupyterHub API to do that in binderhub/launcher.py. JupyterHub in turn uses kubespawner to actually create the pod and talk to kubernetes.

This is where my speculation starts: I'd go look at what information kubespawner expects in the request to launch a pod. Maybe it can be extended to allow BinderHub to add additional information to the request which then gets converted to labels. -> to dig into this I'd read the kubespawner and launcher.py code.

betatim commented 4 years ago

It would make debugging easier for mybinder.org as well if we had a label that tells you the source repository and revision that is in the pod. This means a PR to add this would be welcome.


As a workaround I kubectl exec -it <podname> into pods and run git remote when I really need to know the source repo and can't deduce it from the name.

eexwhyzee commented 4 years ago

Cool, thanks for the reply and pointing me in the right direction!

so here's where repo_url get passed to the spawner: https://github.com/jupyterhub/binderhub/blob/master/binderhub/launcher.py#L200

we would have to figure out a way to extend it here? https://github.com/jupyterhub/kubespawner/blob/master/kubespawner/spawner.py#L1353

For a little more context on what I'm trying to do, I want to add a "elastic.co/target_index" annotation with the repo name as the value in order to create an elasticsearch index for each separate repo

betatim commented 4 years ago

I didn't realise we already passed the repo URL over the API. Nice. Then I think the place to make changes is https://github.com/jupyterhub/binderhub/blob/207ae704752cde82f33fc771a3b58aa1ecc04b6d/helm-chart/binderhub/values.yaml#L78 which is the spawner used by BinderHub. Together with maybe hooking into (or adding to) the _common_annotations method you linked.


Will you use elastic for logging?

eexwhyzee commented 4 years ago

yeah, currently we have ELK set up to send all binderhub notebook std out logs to a single index (and schema).....but I'm trying to explore if we can create a separate index per repo which would allow separate schemas