jupyterhub / kubespawner

Kubernetes spawner for JupyterHub
https://jupyterhub-kubespawner.readthedocs.io
BSD 3-Clause "New" or "Revised" License
548 stars 305 forks source link

Some high-level questions around usage of kubespawner #18

Closed Analect closed 4 years ago

Analect commented 7 years ago

@yuvipanda Thanks for all your work on kubespawner. I've started experimenting with running jupyterhub on kubernetes, largely thanks to this spawner, but I wanted to get some guidance around my use-cases / workflow from someone a bit more seasoned in this technology. I'm structuring these as a series of high-level questions, where your input would be be much appreciated. For ease of explanation, I may refer to the rough sketch below lower down.

image

My efforts so far, for context: I was working through the data-8/jupyterhub-k8s implementation, which I think bases itself off your work, since it's structure in a chart form (fro helm) is the easiest to work with, compared to some of the other implementations I've found out there.

I modified that set-up slightly to handle gitlab authentication (rather than google), which worked OK, but I wasn't able to get the spawning of their large user image (>5GB), based on this Dockerfile and their hub image to work. It was constantly stuck in a Waiting: ContainerCreating state and would then try to re-spawn itself. I haven't figured out what the problem is, but there appears to be plenty of space on the cluster. I'm using v1.51 of kubernetes on GCE.

Anyway, I ended up getting things working using instead the hub image (dockerfile below), a variation of the data-8 one, in conjunction with your yuvipanda/simple-singleuser:v1 user image.

FROM jupyterhub/jupyterhub-onbuild:0.7.1
# Install kubespawner and its dependencies
RUN /opt/conda/bin/pip install \
    oauthenticator==0.5.* \
    git+https://github.com/derrickmar/kubespawner \
    git+https://github.com/yuvipanda/jupyterhub-nginx-chp.git
ADD jupyterhub_config.py /srv/jupyterhub_config.py
ADD userlist /srv/userlist
WORKDIR /srv/jupyterhub
EXPOSE 8081
CMD jupyterhub --config /srv/jupyterhub_config.py --no-ssl

This was able to spawn new user persistent volumes, bind them to PVCs and obviously spawn user jupyter notebook servers, which could be stopped/started and re-use the same PV. My initial tests as to whether new files/notebooks were getting persisted on the PV were failing, since I wasn't saving them under /home, which is where the binding to the volume is happening.

i. user management / userid - After various aborted attempts to get the larger data-8 user image working, and where user PVs weren't deleted. I noticed that the userid appended to username for naming the PV incremented up, but it wasn't clear where this numbering logic was coming from, as it wasn't a env variable in any of the manifests. Is this some fail-safe of some sort?

Currently, I'm using a whitelist userlist for users (see code from jupyterhub_config.py) below, and these correspond with my users' gitlab logins that I'm authenticating against. However, it's probably not a clean solution. I see you are working on another approach on the fsgroup and just wanted to get a better understanding around the context of this solution?

# Whitlelist users and admins
c.Authenticator.whitelist = whitelist = set()
c.Authenticator.admin_users = admin = set()
c.JupyterHub.admin_access = True
pwd = os.path.dirname(__file__)
with open(os.path.join(pwd, 'userlist')) as f:
    for line in f:
        if not line:
            continue
        parts = line.split()
        name = parts[0]
        whitelist.add(name)
        if len(parts) > 1 and parts[1] == 'admin':
            admin.add(name)

ii. possibility for interchangeable images - I find the current default set-up with Jupyterhub allowing for spawning a single image very limiting. I can see from #14 that you are considering extending functionality in the kubespawner to allow for an image to be selected. @minrk was able to confirm over here that it could be possible to pass this image selection programmatically via the jupyterhub API, although I'm not sure, as per this issue, as to whether the hub API will work in a kubernetes context.

You pointed to an implementation by Google here. It's not clear to me where they are deriving their list of available images. How do you think something like this should work?

As per the sketch up top, I'm looking to handle a set-up where users have various private/shared repos (marked 1 above in sketch), from which docker images are generated and stored in a registry (2 above). Then my users (3 above) would be able to spawn a compute environment for their chosen repo and have it spawned in kubernetes (4 above), with the possibility, from 5 above, to have the repo cloned (maybe leveraging gitRepo) and for any incrimental work performed on it, while on the notebook server, persisted (6).

iii. multiple simultaneous servers per user based on different images - As far as I understand, it's not possible with jupyterhub to presently allow a user to have multiples instances of a notebook server, each running a different image? Do the tools exist within kubernetes to potentially facilitate this? Thinking out loud, could this be facilitated by having multiple smaller persistent volumes for a user, based on the repo from which the server image is derived? Or maybe this could be achieved within a single PV, by using the subPath functionality?

c.KubeSpawner.volumes = [
    {
        'name': 'volume-{username}-{repo-namespace}-{repo-name}',
        'persistentVolumeClaim': {
            'claimName': 'claim-{username}-{repo-namespace}-{repo-name}'
        }
    }
]

iv. ideas around version-control - Given the various advantages derived from using kubernetes to host jupyter, I would be curious if you had some thoughts around whether kubernetes also potentially makes it easier to manage version control for notebooks and other files created while in a user works in a notebook server environment. Perhaps something like preStop hooks could be used to commit and push changes prior to a container shutting down.

Even facilitating a user to be able to run git commands from a notebook server terminal .. and have SSH keys back to the version-control system handled via the kubernetes secrets/config maps might be a start. Have you seen any implementations solving this?

Thanks for your patience in reading through this!

yuvipanda commented 7 years ago

\o/ Thank you for your well thought out questions! I want to acknowledge I've seen them, but am travelling presently - will respond in bits and pieces!

On Thu, Jan 5, 2017 at 2:56 AM, Analect notifications@github.com wrote:

@yuvipanda https://github.com/yuvipanda Thanks for all your work on kubespawner. I've started experimenting with running jupyterhub on kubernetes, largely thanks to this spawner, but I wanted to get some guidance around my use-cases / workflow from someone a bit more seasoned in this technology. I'm structuring these as a series of high-level questions, where your input would be be much appreciated. For ease of explanation, I may refer to the rough sketch below lower down.

[image: image] https://cloud.githubusercontent.com/assets/4063815/21677128/9bc79b9c-d330-11e6-85a5-f8602b0bbff1.png

My efforts so far, for context: I was working through the data-8/jupyterhub-k8s https://github.com/data-8/jupyterhub-k8s implementation, which I think bases itself off your work, since it's structure in a chart form (fro helm) is the easiest to work with, compared to some of the other implementations I've found out there.

I modified that set-up slightly to handle gitlab authentication (rather than google), which worked OK, but I wasn't able to get the spawning of their large user image (>5GB), based on this Dockerfile https://github.com/data-8/jupyterhub-k8s/blob/master/user/Dockerfile and their hub image https://github.com/data-8/jupyterhub-k8s/blob/master/hub/Dockerfile to work. It was constantly stuck in a Waiting: ContainerCreating state and would then try to re-spawn itself. I haven't figured out what the problem is, but there appears to be plenty of space on the cluster. I'm using v1.51 of kubernetes on GCE.

Anyway, I ended up getting things working using instead the hub image (dockerfile below), a variation of the data-8 one, in conjunction with your yuvipanda/simple-singleuser:v1 https://github.com/yuvipanda/jupyterhub-simplest-k8s/blob/master/singleuser/Dockerfile user image.

FROM jupyterhub/jupyterhub-onbuild:0.7.1

Install kubespawner and its dependencies

RUN /opt/conda/bin/pip install \ oauthenticator==0.5.* \ git+https://github.com/derrickmar/kubespawner \ git+https://github.com/yuvipanda/jupyterhub-nginx-chp.git ADD jupyterhub_config.py /srv/jupyterhub_config.py ADD userlist /srv/userlist WORKDIR /srv/jupyterhub EXPOSE 8081 CMD jupyterhub --config /srv/jupyterhub_config.py --no-ssl

This was able to spawn new user persistent volumes, bind them to PVCs and obviously spawn user jupyter notebook servers, which could be stopped/started and re-use the same PV. My initial tests as to whether new files/notebooks were getting persisted on the PV were failing, since I wasn't saving them under /home, which is where the binding to the volume https://github.com/data-8/jupyterhub-k8s/blob/master/hub/jupyterhub_config.py#L33-L47 is happening.

i. user management / userid - After various aborted attempts to get the larger data-8 user image working, and where user PVs weren't deleted. I noticed that the userid appended to username for naming the PV incremented up, but it wasn't clear where this numbering logic was coming from, as it wasn't a env variable in any of the manifests. Is this some fail-safe of some sort?

Currently, I'm using a whitelist userlist for users (see code from jupyterhub_config.py) below, and these correspond with my users' gitlab logins that I'm authenticating against. However, it's probably not a clean solution. I see you are working on another approach on the fsgroup https://github.com/jupyterhub/kubespawner/commit/13edc761448f21b23f13d5b26b705b41c83b8c15 and just wanted to get a better understanding around the context of this solution?

Whitlelist users and admins

c.Authenticator.whitelist = whitelist = set() c.Authenticator.admin_users = admin = set() c.JupyterHub.admin_access = True pwd = os.path.dirname(file) with open(os.path.join(pwd, 'userlist')) as f: for line in f: if not line: continue parts = line.split() name = parts[0] whitelist.add(name) if len(parts) > 1 and parts[1] == 'admin': admin.add(name)

ii. possibility for interchangeable images - I find the current default set-up with Jupyterhub allowing for spawning a single image very limiting. I can see from #14 https://github.com/jupyterhub/kubespawner/issues/14 that you are considering extending functionality in the kubespawner to allow for an image to be selected. @minrk https://github.com/minrk was able to confirm over here https://github.com/jupyterhub/jupyterhub-deploy-docker/issues/25#issuecomment-260932976 that it could be possible to pass this image selection programmatically via the jupyterhub API, although I'm not sure, as per this https://github.com/jupyterhub/jupyterhub/issues/891 issue, as to whether the hub API will work in a kubernetes context.

You pointed to an implementation by Google here https://github.com/sveesible/jupyterhub-kubernetes-spawner/blob/master/kubernetespawner/spawner.py#L174-L214. It's not clear to me where they are deriving their list of available images. How do you think something like this should work?

As per the sketch up top, I'm looking to handle a set-up where users have various private/shared repos (marked 1 above in sketch), from which docker images are generated and stored in a registry (2 above). Then my users (3 above) would be able to spawn a compute environment for their chosen repo and have it spawned in kubernetes (4 above), with the possibility, from 5 above, to have the repo cloned (maybe leveraging gitRepo http://kubernetes.io/docs/user-guide/volumes/#gitrepo) and for any incrimental work performed on it, while on the notebook server, persisted (6).

iii. multiple simultaneous servers per user based on different images - As far as I understand, it's not possible with jupyterhub to presently allow a user to have multiples instances of a notebook server, each running a different image? Do the tools exist within kubernetes to potentially facilitate this? Thinking out loud, could this be facilitated by having multiple smaller persistent volumes for a user, based on the repo from which the server image is derived? Or maybe this could be achieved within a single PV, by using the subPath http://kubernetes.io/docs/user-guide/volumes/#using-subpath functionality?

c.KubeSpawner.volumes = [ { 'name': 'volume-{username}-{repo-namespace}-{repo-name}', 'persistentVolumeClaim': { 'claimName': 'claim-{username}-{repo-namespace}-{repo-name}' } } ]

iv. ideas around version-control - Given the various advantages derived from using kubernetes to host jupyter, I would be curious if you had some thoughts around whether kubernetes also potentially makes it easier to manage version control for notebooks and other files created while in a user works in a notebook server environment. Perhaps something like preStop http://kubernetes.io/docs/user-guide/container-environment/#container-hooks hooks could be used to commit and push changes prior to a container shutting down.

Even facilitating a user to be able to run git commands from a notebook server terminal .. and have SSH keys back to the version-control system handled via the kubernetes secrets/config maps might be a start. Have you seen any implementations solving this?

Thanks for your patience in reading through this!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jupyterhub/kubespawner/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB23qmnLmR_H-oyusHajCy23r-FFKaNks5rPMxZgaJpZM4LbkgJ .

-- Yuvi Panda T http://yuvi.in/blog

Analect commented 7 years ago

@yuvipanda ... just wondering if you've had any time to think about some of the items raised above. Much appreciated.

yuvipanda commented 7 years ago

Yes! I have a drafted a response! Will hopefully complete in a few hours. Thanks for your patience!

On Jan 14, 2017 6:16 PM, "Analect" notifications@github.com wrote:

@yuvipanda https://github.com/yuvipanda ... just wondering if you've had any time to think about some of the items raised above. Much appreciated.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jupyterhub/kubespawner/issues/18#issuecomment-272622054, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB23i-FJuQyNQvUJX1wN18HaUVXz0NGks5rSMOJgaJpZM4LbkgJ .

yuvipanda commented 7 years ago

On Thu, Jan 5, 2017 at 4:26 PM, Analect notifications@github.com wrote:

@yuvipanda https://github.com/yuvipanda Thanks for all your work on kubespawner. I've started experimenting with running jupyterhub on kubernetes, largely thanks to this spawner, but I wanted to get some guidance around my use-cases / workflow from someone a bit more seasoned in this technology. I'm structuring these as a series of high-level questions, where your input would be be much appreciated. For ease of explanation, I may refer to the rough sketch below lower down.

[image: image] https://cloud.githubusercontent.com/assets/4063815/21677128/9bc79b9c-d330-11e6-85a5-f8602b0bbff1.png

This is an awesome sketch! May I ask how you created it?

https://cloud.githubusercontent.com/assets/4063815/21677128/9bc79b9c-d330-11e6-85a5-f8602b0bbff1.png

My efforts so far, for context: I was working through the data-8/jupyterhub-k8s https://github.com/data-8/jupyterhub-k8s implementation, which I think bases itself off your work, since it's structure in a chart form (fro helm) is the easiest to work with, compared to some of the other implementations I've found out there.

I modified that set-up slightly to handle gitlab authentication (rather than google), which worked OK, but I wasn't able to get the spawning of their large user image (>5GB), based on this Dockerfile https://github.com/data-8/jupyterhub-k8s/blob/master/user/Dockerfile and their hub image https://github.com/data-8/jupyterhub-k8s/blob/master/hub/Dockerfile to work. It was constantly stuck in a Waiting: ContainerCreating state and would then try to re-spawn itself. I haven't figured out what the problem is, but there appears to be plenty of space on the cluster. I'm using v1.51 of kubernetes on GCE.

Anyway, I ended up getting things working using instead the hub image (dockerfile below), a variation of the data-8 one, in conjunction with your yuvipanda/simple-singleuser:v1 https://github.com/yuvipanda/jupyterhub-simplest-k8s/blob/master/singleuser/Dockerfile user image.

FROM jupyterhub/jupyterhub-onbuild:0.7.1

Install kubespawner and its dependencies

RUN /opt/conda/bin/pip install \ oauthenticator==0.5.* \ git+https://github.com/derrickmar/kubespawner \ git+https://github.com/yuvipanda/jupyterhub-nginx-chp.git ADD jupyterhub_config.py /srv/jupyterhub_config.py ADD userlist /srv/userlist WORKDIR /srv/jupyterhub EXPOSE 8081 CMD jupyterhub --config /srv/jupyterhub_config.py --no-ssl

This was able to spawn new user persistent volumes, bind them to PVCs and obviously spawn user jupyter notebook servers, which could be stopped/started and re-use the same PV. My initial tests as to whether new files/notebooks were getting persisted on the PV were failing, since I wasn't saving them under /home, which is where the binding to the volume https://github.com/data-8/jupyterhub-k8s/blob/master/hub/jupyterhub_config.py#L33-L47 is happening.

Awesome! In the last week or so, I've spent a lot of time generalizing the helm configuration a lot more, and it should be more widely usable (with multiple authenticators support) soon. We're deploying it for UC Berkeley's class starting Monday, so will have more time to actually write documentation after that. I intend to get it included in github.com/kubernetes/charts eventually, to make it an officially supported way of installing JupyterHub.

i. user management / userid - After various aborted attempts to get the larger data-8 user image working, and where user PVs weren't deleted. I noticed that the userid appended to username for naming the PV incremented up, but it wasn't clear where this numbering logic was coming from, as it wasn't a env variable in any of the manifests. Is this some fail-safe of some sort?

Currently, I'm using a whitelist userlist for users (see code from jupyterhub_config.py) below, and these correspond with my users' gitlab logins that I'm authenticating against. However, it's probably not a clean solution. I see you are working on another approach on the fsgroup https://github.com/jupyterhub/kubespawner/commit/13edc761448f21b23f13d5b26b705b41c83b8c15 and just wanted to get a better understanding around the context of this solution?

Whitlelist users and admins

c.Authenticator.whitelist = whitelist = set() c.Authenticator.admin_users = admin = set() c.JupyterHub.admin_access = True pwd = os.path.dirname(file) with open(os.path.join(pwd, 'userlist')) as f: for line in f: if not line: continue parts = line.split() name = parts[0] whitelist.add(name) if len(parts) > 1 and parts[1] == 'admin': admin.add(name)

There are multiple types of users / userids, which is confusing!

  1. The JupyterHub user id - this is simply the id of the entry for the user in the sqlite table. This is pretty useless for everything other than as unique identifiers. This is used in the pod name to make sure no two users' pods have the same name - since we 'normalize' the username to a subset of ascii, there are plenty of cases where two pods can have the same names if only username is used. Hence we append ID to it. There is pretty much no other external use of the id anywhere.
  2. The unix user as which the notebook process runs. This is completely separate from and unrelated to (1). This is specified in the Dockerfile (as USER) and overrideable as c.KubeSpawner.singleuser_uid. These users are what is used for permission checks (writing things to persistent storage for example - this is what was causing permission errors when writing to the mounted persistent volume). fsgroup is related to this as well - it should be set to a group that this unix user is part of so that singleuser servers can mount and write to persistent volumes properly. In Kubernetes, this should ideally just always be one unix user that's the same for all users - they're all contained in containers, so this is ok.

As for deleting PVs - if you delete PVs you lose the data in them (since dynamically provisioned PVs always have reclaimPolicy: Delete). Hence it is a manual operation that is not automated at all - you have to delete the linked PVC manually, which will delete the PV (and lose your data)

ii. possibility for interchangeable images - I find the current default set-up with Jupyterhub allowing for spawning a single image very limiting. I can see from #14 https://github.com/jupyterhub/kubespawner/issues/14 that you are considering extending functionality in the kubespawner to allow for an image to be selected. @minrk https://github.com/minrk was able to confirm over here https://github.com/jupyterhub/jupyterhub-deploy-docker/issues/25#issuecomment-260932976 that it could be possible to pass this image selection programmatically via the jupyterhub API, although I'm not sure, as per this https://github.com/jupyterhub/jupyterhub/issues/891 issue, as to whether the hub API will work in a kubernetes context.

You pointed to an implementation by Google here https://github.com/sveesible/jupyterhub-kubernetes-spawner/blob/master/kubernetespawner/spawner.py#L174-L214. It's not clear to me where they are deriving their list of available images. How do you think something like this should work?

As per the sketch up top, I'm looking to handle a set-up where users have various private/shared repos (marked 1 above in sketch), from which docker images are generated and stored in a registry (2 above). Then my users (3 above) would be able to spawn a compute environment for their chosen repo and have it spawned in kubernetes (4 above), with the possibility, from 5 above, to have the repo cloned (maybe leveraging gitRepo http://kubernetes.io/docs/user-guide/volumes/#gitrepo) and for any incrimental work performed on it, while on the notebook server, persisted (6).

This can be done currently with https://jupyterhub.readthedocs.io/en/latest/spawners.html#spawner-options-form. Are you thinking of the list of images as being static (ie specified by administrator) or dynamic? If dynamic it might be a little more difficult, but not impossible. I see you've already dug into this on Gitter - would love to see your solution so we can make it easier in KubeSpawner :)

iii. multiple simultaneous servers per user based on different images - As far as I understand, it's not possible with jupyterhub to presently allow a user to have multiples instances of a notebook server, each running a different image? Do the tools exist within kubernetes to potentially facilitate this? Thinking out loud, could this be facilitated by having multiple smaller persistent volumes for a user, based on the repo from which the server image is derived? Or maybe this could be achieved within a single PV, by using the subPath http://kubernetes.io/docs/user-guide/volumes/#using-subpath functionality?

c.KubeSpawner.volumes = [ { 'name': 'volume-{username}-{repo-namespace}-{repo-name}', 'persistentVolumeClaim': { 'claimName': 'claim-{username}-{repo-namespace}-{repo-name}' } } ]

This is a little more difficult from JupyterHub but active work is being done on this right now - follow https://github.com/jupyterhub/jupyterhub/issues/766 for more details!

iv. ideas around version-control - Given the various advantages derived from using kubernetes to host jupyter, I would be curious if you had some thoughts around whether kubernetes also potentially makes it easier to manage version control for notebooks and other files created while in a user works in a notebook server environment. Perhaps something like preStop http://kubernetes.io/docs/user-guide/container-environment/#container-hooks hooks could be used to commit and push changes prior to a container shutting down.

Even facilitating a user to be able to run git commands from a notebook server terminal .. and have SSH keys back to the version-control system handled via the kubernetes secrets/config maps might be a start. Have you seen any implementations solving this?

Thanks for your patience in reading through this!

If you are using GitHub for authentication, then we could possibly do something like generate a personal access token when the user logs in and then put it in an appropriate place on the notebook container, thus allowing users to pull / push natively. I think that's far better than wrapping git with some magic, which in my experience ends badly always. In https://github.com/yuvipanda/paws/blob/master/hub/jupyterhub_config.py#L41 I pass extra generated parameters into the single-user notebook from the hub, and we could do something similar here.

Action items from here are:

  1. Play with getting GitHub personal access token into environment variables / proper locations on disk so people can push / pull from repos
  2. Expand documentation on what 'users' are and how the various kinds of 'users' are used
  3. See if you need any follow up help on the docker image selection with options form thing
  4. Continue making the helm config configurable enough for general use.

Feel free to ask follow up questions here or on gitter! Looking forward to seeing what cool things you are doing!

-- Yuvi Panda T http://yuvi.in/blog

Analect commented 7 years ago

@yuvipanda . Thanks for your responses.

This is an awesome sketch! May I ask how you created it?

I think you're going to be disappointed when I tell you powerpoint!

image

Yes, I've seen a flurry of activity cleaning up the data-8 implementation, which looks great. It would be nice to get an implementation under github.com/kubernetes/charts

Ref {user}-{user-id} ... thanks for the explanation. In my jupyterhub_config.py I have a whitelist of 3 or 4 users for testing ... and I'm at the same time authenticating these users against a gitlab authenticator .... and I noticed as I was bringing up and down the helm chart ... it was sometimes incrementing a different id against my user ... see the case for my username below ... where 1,2,3 and 4 got appended ... and so, there wasn't really a consistency in term of which PV got appended to a container. Perhaps my jupyterhub.sqlite was somehow getting corrupted for this to have happenend.

image

Ref. passing image to get spawned.

If dynamic it might be a little more difficult, but not impossible.

OK, based on heavy prompting from @minrk ... I was able to modify jupyterhub_config.py to include this ... which was able to pick up new 'image' payloads passed to the JupyterHub API.

from traitlets import observe
from kubespawner.spawner import KubeSpawner
class MySpawner(KubeSpawner):
    @observe('user_options')
    def _update_options(self, change):
        options = change.new
        if 'image' in options:
            self.singleuser_image_spec = options['image']
c.JupyterHub.spawner_class = MySpawner

So all the other c.KubeSpawner entries required in the jupyterhub_config.py then got changed to c.MySpawner.

I then pass this API call to jupyterhub ... and it appears to work. I have obviously pushed that image to my private docker registry first.

curl -v -X POST -H "Authorization: token my-testuser-token"  \
"http://jupyterhub.myserver.com/hub/api/users/testuser/server" \
-d '{"image": "my-private-registry/my-simple-singleuser:v1.1"}'

However, it's not bullet-proof. For instance, for larger images (2GB+), I noticed sometimes kubernetes is slow to pull the image ... and so you end up in this situation (see table below) ... where it eventually aborts ... which isn't ideal. However, I found deleting the pod and then retrying the above seemed to resolve. Maybe there's a better approach of pulling these images down to kubernetes ahead of time ... or maybe there's better performance if the images are pushed to a google registry (on the assumption one is using their kubernetes implementation, of course).

NAME                                READY     STATUS             RESTARTS   AGE
jupyter-testuser-4               0/1       ContainerCreating   0          6m
jupyter-testuser-4               0/1       ImagePullBackOff   0          8m
jupyter-testuser-4               0/1       ErrImagePull   0          12m

Obviously once the image is pulled to the kubernetes cluster, then spawning from the hub is a matter of seconds.

Ref multi-servers per user ... yes, I've been keeping an eye on this and this.

Ref. version-control ... I'm using a self-hosted gitlab rather than github. They have a similar user-token concept, so maybe, as you said, passing that as a 'secret' or 'config map' variable per user, might work.

Given that I'm experimenting with spawning into 'lab' environments, rather than the classic notebook 'tree', I've been looking for ways to pass a template ... a bit like how the notebooks.azure.com implementation below (although they are still working against the classic notebook).

image

It seems doing the same for jupyterlab is a bit more involved (see this issue), requiring a plugin on the jupyterlab end, but it appears some of the required tooling is in place with jupyterhub-labextension. I'm not sure this is ready for usage yet though.

If it were, then maybe one could potentially give a rudimentary way of pushing/pulling to a repo, by exposing, in my case, the gitlab API via some buttons on that template. I would be interested in whether you thought that viable or not.

Anyway, thanks for the dialogue on these matters.

consideRatio commented 4 years ago

@Analect I love how you thoroughly documented your thoughts in this issue! :heart:

I'm closing it now as it is stale and doesn't seem to have a specific action point related to it.