Call Program before Jupyter Hub Launch

AndiH commented 4 years ago

Disclaimer: I'm an absolute OpenShift newbee, but want to use the JupyterHub Quickstart for a HPC tutorial soon.

Is it possible to execute a Program (/Bash script) before JupyterHub is launched? I need to set up environment variables and move things around before the Notebook is started.

jupyterhub_config.sh seems to be intended for Shell commands, but I don't know how to use the file. There's also a corresponding entry in the configmap, but I don't know how to use it (and am not sure if this is really intended to be used for this kind of thing).

GrahamDumpleton commented 4 years ago

Sorry for delay on this. Slipped out of the top of the inbox very quickly as been a busy week.

Can you clarify whether you want these steps to run inside of the pod for JupyterHub, or in the pod for each users Jupyter notebook instance?

AndiH commented 4 years ago

My use case is to target users to certain directories (via environment variables) and run some setup procedures.

I think the pod for each Jupyter Notebook instance would be the correct place! (If there would be a similar hook for the JupyterHub pod, it would a good addition as well, I think.)

GrahamDumpleton commented 4 years ago

Are the directories for storage?

One way is you would mount a sub directory for the user from a shared persistent volume, rather than mount the whole persistent volume and then place them in a specific directory. If you were to do the latter, they could see and modify other peoples files.

For an example of this scheme if only want to use a single persistent volume for all users, as opposed to a persistent volume per user, see:

https://github.com/jupyter-on-openshift/poc-hub-singapore-ntu

In particular the JupyterHub config at:

https://github.com/jupyter-on-openshift/poc-hub-singapore-ntu/blob/master/jupyterhub/.jupyter/jupyterhub_config.py#L86

AndiH commented 4 years ago

Thanks for the hints!

It's a good idea to mount user-specific sub-directories! I'll have a look into it!

In general our setup is even a bit more complicated: We are running HPC jobs through a batch submission system launched from Notebooks. The shared filesystem is mounted into the pod – and only this file system can be accessed from the submitted job. So, users would be able to escape their sub-directory (via the backend), if they really wanted to – but still, I consider mounting sub-directories a good idea.

GrahamDumpleton commented 4 years ago

The Jupyter notebook images in this GitHub org also support an environment variable JUPYTER_WORKSPACE_NAME which if set will cause the file browser to start on a sub directory. It only works for classic notebook interface though, not JupyterLab interface.

The changes I made a few hours back related to https://github.com/jupyter-on-openshift/jupyter-notebooks/issues/16 would allow you to supply a shell script which is run during start up sequence. Theoretically it could read an environment variable and change the working directory before starting the notebook. That shell script needs to be stored at .jupyter/jupyter_notebook_config.sh in any custom notebook image. You need to be using version 2.4.1 or later of the notebook images as base.

AndiH commented 4 years ago

Thank you!

I'll need some time to digest this and try it out!

GrahamDumpleton commented 4 years ago

Keep me in the loop of what you are trying to do. There is all sorts of ways you can adapt JupyterHub and I am working on some new built in configuration options. One will provide play pens for users where have authentication and cluster access from the notebook to deploy extra stuff. Another will be a test lab environment where when user requests selected notebook, additional workloads can be deployed into a linked project on demand for what may be required by the notebook. So you could for example deploy a Dask or Spark cluster automatically on startup of session the first time.

AndiH commented 4 years ago

Both things sound really good. But especially the sample project from Singapore NTU looks very intersting.

We set up our Notebooks the following: Login to HPC system via SSH (with a forwarded port), load the environment you need, start juptyer lab, connected to forwarded port. And that is what we wanted to re-create with OpenShift such that users don't come in contact with SSH and port-forwarding (error-prone…).
Unfortunately our tutorial is next Monday, so I fear we won't have everything in place until then.

GrahamDumpleton commented 4 years ago

If you want to hop on a video chat session to discuss options to try and speed things up let me know. Been doing various Jupyter stuff this last week so I am in the right frame of mind to help out if I can.

AndiH commented 4 years ago

I'd love to! Can I contact you somewhere privately? I've just added you on Twitter.

jupyter-on-openshift / jupyterhub-quickstart

Call Program before Jupyter Hub Launch #29