Include a reference to the source repository inside the image

jupyterhub / repo2docker-action

A GitHub action to build data science environment images with repo2docker and push them to registries.

MIT License

142 stars 28 forks source link

Include a reference to the source repository inside the image #86

Open choldgraf opened 2 years ago

choldgraf commented 2 years ago

Context on the use-case Currently, there's no "quick" way to ask "where is the repository that defined the current user image" for images built from this action. This would be very useful for debugging or understanding the user environment for a JupyterHub.

Describe the solution you'd like To be able to list the location of the image's source files from within the user image. For example, either:

A menu-dropdown in JupyterLab or an entry in the JupyterHub admin page that would have a link like: Link to user image repository.
An environment variable I could grab like os.environ["USER_IMAGE_SOURCE"]

This would make it much quicker to figure out where source image files are located, which would make it faster / simpler to debug.

Implementation ideas

Binder already does something like this, and it is very useful. For example, in the classic notebook interface:

and in lab:

(I'm not sure where this is implemented though)

manics commented 2 years ago

(I'm not sure where this is implemented though)

It's https://github.com/manics/jupyter-offlinenotebook. I've been meaning to transfer it to the JupyterHub GitHub org, was hoping to clean up some dependabot things first but maybe I should worry about that later.

manics commented 2 years ago

Can we add this metadata in a more structured way, for example by using container labels (some standard ones are in https://github.com/opencontainers/image-spec/blob/main/annotations.md but we can add more), and then perhaps expose them as environment variables when running the image? This means repo2docker or BinderHub would have to

query the built image for all labels
Add them to the environment (all, or perhaps just a subset, e.g. rees.*), so LABEL rees.image.source.repo=https://github.com/jupyterhub/example gets mapped to environment variable REES_IMAGE_SOURCE_REPO when the container is run?

The key difference from https://github.com/jupyterhub/repo2docker/pull/1144 is that we'd make this part of repo2docker/binderhub and always set the required environment variables using the standard container run API instead of relying on Appendix to be configured by a user.

choldgraf commented 2 years ago

@manics definitely +1 on doing this in a more structured way! Environment variables seems reasonable to me. Then it would be up to extensions, UIs, etc to use those variables and expose them to users in whatever way they wanted