Open choldgraf opened 6 years ago
You could use this to provide access to data or longer term storage. Providing a mechanism/example config for people to setup read-only storage that gets mounted to say ~/data
and a read-write storage mounted to ~/personal
would be useful I think. One thing that is a bit tricky is that a repo could contain a directory with the same name as personal/
or data/
. Maybe mount the repo at ~/repo
instead of directly at ~
(https://github.com/jupyter/repo2docker/pull/134)
The primary thing preventing us from using persistent storage with binder is Authentication. We have no idea which disk should be mounted for which user. #323 should help fix that.
@betatim Mounting the repo in a path under $HOME won't help, since mounting persistent volumes in $HOME will just overwrite everything under. What you need is ability to mount the persistent volume somewhere else, which is already supported. We could also consider a postStart hook that copies files over. We aren't dependent on that repo2docker PR for this particular feature IMO
You could mount the repository at ~/repo
and a read only volume at ~/data
and they'd happily coexist no?
You could also mount the shared/persistent volumes somewhere like /data
but then you can't navigate there with the jupyter tree view because that is rooted in ~/
.
I agree! It's not necessary for supporting persistent volumes, but very nice to have! Authentication is a blocker though.
ok cool - I've updated the top-level comment w/ the current state of this issue so we can keep track of what needs to be done
Curious - this is handled by JupyterHub, right? The difference is JupHub doesn't built new images/use multiple images...?
@ctb yep, JupyterHub can serve a pre-existing docker image that's in a registry, but it doesn't have the machinery to automatically build/register images from git repositories. Hopefully as @yuvipanda says, we can eventually merge these so they don't have to be two totally separate things, then Binder is more of a service, rather than a service and a specific piece of technology that's custom built for it.
The primary thing preventing us from using persistent storage with binder (https://github.com/jupyterhub/binderhub/pull/666) is now in place and we would like to go ahead with persistent storage. However
“”in a vanilla jupyterhub we mount the persistent disk to /home/jovyan now we combine them and ... mount both to /home/jovyan?”” (@betatim, at gitter)
A number of proposals have been discussed which seem to fall roughly into these categories:
repository at ~/repo and persistent storage at ~/data
repository at ~ (as is the case now) and a persistent storage at ~/data
persistent storage at ~ and repository at ~/<repo>
or ~/<repo-name>
Each user would then have his own persistent storage that is shared across sessions.
I think it would be desirable to have persistency at ~ (as in jupyterhub) and having it at ~/data seems already to be possible with something like jupyterlab-google-drive. This makes option 3 look currently the most useful to me. What are your thoughts?
We would like to implement persistency and while there have already been numerous discussions in different directions (@yuvipanda here or @nthiery here) it would be good to have some more understanding about the consequences of the different options (e.g. are there repositories that assume to be mounted at ~, what could be the role of nbgitpuller).
I am working on setting up a hub with auth and persistent storage (exploring options for @nthiery) over the next few weeks -> we should coordinate.
My short term plan to get something working and used by people is to mount the repo to ~/repo
and the persistent volume to ~/home
.
The next iteration would be to explore how having /home/jovyan
be a persistent volume. With repo2docker
copying the contents of the repository to /repo
and using nbgitpuller
to copy/pull stuff over to /home/jovyan/<repo-name>
when the container launches. This means you'd get the semantics of nbgitpuller for keeping changes to the repo (or not).
Both require some work on repo2docker, BinderHub and how the hub is deployed. What do you think of this kind of two stage approach? I'm not sure I am 100% convinced of the second phase yet as the "perfect" solution (and it will require a bit of work in repo2docker) hence going for something simpler first to gain some more ideas and experience.
Some things I am pondering:
~
pip install -e.
which breaks if we move the repo via nbgitpuller
I am working on setting up a hub with auth and persistent storage (exploring options for @nthiery) over the next few weeks -> we should coordinate.
Sure
My short term plan to get something working and used by people is to mount the repo to ~/repo and the persistent volume to ~/home.
I think we had already something like this running. @bitnik is that correct?
The next iteration would be to explore how having /home/jovyan be a persistent volume. With repo2docker copying the contents of the repository to /repo and using nbgitpuller to copy/pull stuff over to /home/jovyan/
when the container launches.
This is what would imo allow the users to keep their expectations on how JHub behaves and is what we are currently aiming at. Yet, we are likewise not 100% convinced that this is the final "perfect" solution.
I am working on setting up a hub with auth and persistent storage (exploring options for @nthiery) over the next few weeks -> we should coordinate.
Sure
so how to coordinate best?
I think we had already something like this running. @bitnik is that correct?
Yes, once we tried it by using Kubespawener.lifecycle_hooks.postStart
which does some cp, rm and ln
but I am not sure if it was a good implementation.
I've had a number of people (especially at universities) ask me if it'd possible to enable persistent storage with BinderHub. The most common use case seems to be to use a BinderHub server to let teachers create repositories/environments for their classes / bootcamps / etc, but they'd like students to have their own "space" where things will persist over time.
I anticipate these requests to just increase with time, but it's also a bit unclear to me exactly how this functionality would be combined w/ BinderHub. Maybe this could behave in the same way that
nbgitpuller
does.Just opening this issue since I noticed we don't have another place in this repo where we discuss the topic. I'm curious if people have thoughts on a path forward for something like this.
Current status
https://github.com/jupyterhub/binderhub/issues/377#issuecomment-353501247