jupyter-on-openshift / jupyterhub-quickstart

OpenShift compatible version of the JupyterHub application.
Apache License 2.0
101 stars 107 forks source link

persist notebooks #3

Closed goern closed 6 years ago

goern commented 6 years ago

As a single user, I want to spawn a notebook using persistent storage, so that my work is not lost if the jupyterhub pod is redeployed.

GrahamDumpleton commented 6 years ago

To use persistent storage with JupyterHub, you want to first be setting it up to use a proper authenticator. You shouldn't use persistent storage with the tmpnb style authenticator because number of storage volumes required will explode because any number of users could come in, or a single user could come in more than once even.

In the example below, I have previously enabled GitHub as an OAuth provider to use. To use it, and also have persistent storage, I am going to use the JupyterHub template which deploys from a pre-existing notebook image. In the web console when I get the template fields up, I am going to paste the following in the JUPYTERHUB_CONFIG field. You can select on expansion icon at right end of field to make it display multi line.

from oauthenticator.github import GitHubOAuthenticator
c.JupyterHub.authenticator_class = GitHubOAuthenticator

c.GitHubOAuthenticator.oauth_callback_url = 'https://myhubname-myprojectname.b9ad.pro-us-east-1.openshiftapps.com/hub/oauth_callback'
c.GitHubOAuthenticator.client_id = 'my-client-key-from-github'
c.GitHubOAuthenticator.client_secret = 'my-client-secret-from-github'

c.KubeSpawner.user_storage_pvc_ensure = True
c.KubeSpawner.user_storage_capacity = '1Gi'
c.KubeSpawner.pvc_name_template = '%s-nb-{username}-pvc' % c.KubeSpawner.hub_connect_ip
c.KubeSpawner.volumes = [dict(name='data', persistentVolumeClaim=dict(claimName=c.KubeSpawner.pvc_name_template))]
c.KubeSpawner.volume_mounts = [dict(name='data', mountPath='/opt/app-root/src')]

c.Authenticator.admin_users = { 'grahamdumpleton' }

The first part sets up the authenticator. The c.GitHubOAuthenticator.oauth_callback_url should be a URL based on what the JupyterHub instance is deployed as. Note that you need to rebuild the jupyterhub image against latest as tweaked it so that oauthenticator package is installed by default. If needed some separate package for the authenticator, then is necessary to use the jupyterhub image as S2I builder, ie., JupyterHub Builder template, to create custom image from Git repo where repo contains requirements.txt with extra packages which need to be installed. You can also have the jupyterhub_config.py in that repo as well if doing that, but I am pasting it through template when deploying in this case.

The second part is configuring the KubeSpawner to claim a PV and mount it. In this case the image I am using has a working directory of /opt/app-root/src and so mounting it there. This will hide anything in that directory in the notebook image. If you had files in the image that you wanted to pre-populate the PV with, it is necessary to add config for tell KubeSpawner to run an init container, with same image, and which mounts PV at tmp location so can copy files from image to PV the first time. The main container then mounts PV on top hiding transient version.

goern commented 6 years ago

Thx a lot, we will have a look at it!

GrahamDumpleton commented 6 years ago

Closing this. Have added some basic notes about how to configure persistent storage in:

How to handle populating persistent volume is trickier issue which depends on specific requirements. Will try and document some guidelines about that later.