Closed bitnik closed 4 years ago
This is pretty cool!
This is kind of blowing my mind. It's exactly what we need for Pangeo.
I deleted that deployment yesterday and did a new one today without any GESIS related parts in helm config and templates. I hope it helps people who had interest.
It works exactly same as previous one, just base url is changed from /jupyter/
to /
: https://notebooks.gesis.org/. But I don't know how long this one will stay alive, because these day we are trying out many different things.
Config files:
Dockerfile for JupyterHub with custom templates (home template with binder form): https://github.com/gesiscss/example-binderhub-deployments/tree/master/persistent_storage/jupyterhub
One thing that I think would be nice is to have a symbolic link in each repository called "data" or "home" or something that points to a directory that is a sibling to the various repositories.
A directory structure like
/home/jovyan/repositoryA
/home/jovyan/repositoryA/data -> ../data
/home/jovyan/repositoryB
/home/jovyan/repositoryB/data -> ../data
/home/jovyan/data
to make it easier for people to navigate from the Jupyter file browser view to a place outside the current repository. Maybe data
isn't unique enough a name, so MyData
or PermanentStorage
or $DeploymentData
(so GesisData
in this case) or something which is weird enough that it will hardly ever shadow something from the repository?
One idea floating around --target-repo-dir
was to use /somewhere/else
as the place to clone the directory to in repo2docker and use nbgitpuller (or a tool with similar semantics) to "move" the contents of the repo into /home/jovyan/repoA
from there on launch. Would go someway towards solving the question of "the source repo has been updated, what should we do with the user's directory now?"
@betatim thanks! I did your 1st suggestion. For now I named it persistent_storage
(if a repo contains a file/folder with same name, symbolic link is not created).
But your 2nd suggestion is not clear to me. Persistent volume is already mounted on /home/jovyan
, so it overwrites repo folder (cloned by repo2docker) and we already use nbgitpuller
to clone and update the repo.
But your 2nd suggestion is not clear to me.
Didn't realise that you were already doing this.
This work is really interesting !
We are trying to do the same, the user can upload data from the web https://github.com/SIMEXP/Repo2Data (discussed here jupyter/repo2docker#460) into our server.
@bitnik We have a binder running on our server and were wondering how to "mount" the data in the user's notebook, and how to launch repo2data
every time a user upload a new repository. Could you explain more in details how we could do that (for now we are not wondering about authentification) (using https://github.com/gesiscss/example-binderhub-deployments/blob/master/persistent_storage/config.yaml) ? Here is our config file :
jupyterhub:
ingress:
enabled: true
hosts:
- conp7.calculquebec.cloud
annotations:
ingress.kubernetes.io/proxy-body-size: 64m
kubernetes.io/ingress.class: nginx
kubernetes.io/tls-acme: 'true'
hub:
baseUrl: /jupyter/
proxy:
service:
type: NodePort
singleuser:
memory:
guarantee: 4G
cpu:
guarantee: 2
# BinderHub config
config:
BinderHub:
hub_url: https://conp7.calculquebec.cloud/jupyter
use_registry: true
image_prefix: cmdntrf/conp7.calculquebec.cloud-
service:
type: NodePort
storage:
capacity: 2G
ingress:
enabled: true
hosts:
- conp7.calculquebec.cloud
annotations:
kubernetes.io/ingress.class: nginx
https:
enabled: true
type: kube-lego
config:
# Allow POSTs of upto 64MB, for large notebook support.
proxy-body-size: 64m
@betatim we are also thinking on showing the /data
folder but in our case we don't really want to show all the details (medical data), so using just headers (with datalad for example) could be a solution.
Thanks to you both,
@ltetrel as I understand, you don't want to have authentication but you want to mount a data volume into each anonymous user's pod. Then I assume this volume is already filled with some data and it is to be shared with all user pods as readonly, am I right? (I didn't try to do that so far)
And you also want to run repo2data
and download requested data (data_requirement.json
) every time a user launches a new repo if launched repo contains data_requirement.json
. But where do you want to download that data? If into user's home directory, why don't you let user to do that with postBuild
(as it is done here https://github.com/bitnik/binder_repor2data)?
I would like to help :) but I am a bit confused. Could you elaborate your goal and maybe we can continue discussing this in another issue?
Hi @bitnik and thank for your help :) We can continue the discussion here : https://discourse.jupyter.org/t/mounting-server-data-on-each-users-pod/641
Complementary issue #1003 with some nice ideas
Thanks @arnim
But in our case we want persistent storage. We got it working by using these ideas here : https://discourse.jupyter.org/t/mounting-server-data-on-each-users-pod/641/4
We have a nfs storage mounted on each node to centralize the data administration and avoid duplication : https://github.com/neurolibre/neurolibre-binderhub/issues/18
We were also thinking to use an initContainer
instead of putting repo2data into the config file. This has the advantage of making the process of downloading the data (if needed) more independent (running in a separate container instead).
I am closing this issue. We can continue discussing this on https://discourse.jupyter.org/t/a-persistent-binderhub-deployment/2865.
Currently we are working on a binder deployment with authentication and persistent storage enabled and with a user interface in JupyterHub home page, where users can manage their repositories/projects.
For this purpose we have now a deployment running on https://notebooks-test.gesis.org/jupyter/. When you first login, you will see the JupyterHub home page (https://notebooks-test.gesis.org/jupyter/hub/home) with 2 parts: "Your projects" table and the classical binder form with some parts hidden:
Binder is running under https://notebooks-test.gesis.org/jupyter/services/binder/ and you can also use it but in this deployment the idea is that you don't need to use it directly.
How it works
Firstly some preliminary information:
/home/jovyan
Binder form
It is the classical form with 'share url' and 'badge url' parts are hidden. And it has 1 limitation: branch/tag/commit field is readonly and always "master". When user launches a repo via form:
nbgitpuller
is used to pull the code under a sub directory/home/jovyan/{repo_dir}
.repo_dir
is generated by using provider name, user/org name and repo name. And server is started on that sub directory (you can start a new terminal and there you can list all directories of projects).nbgitpuller
is not executed for the default repo (gesiscss/data_science_image
).state
field ofSpawners
table and only last 10 launched repos are saved.In short, binder form is used to create a new project and update it from remote.
Your Projects
When first login, user has there only the default repo (
gesiscss/data_science_image
). Each repo which is built and launched via binder form is added in this table and user can re-start that repository by using the start buttons on each row. When user clicks on a start button in the table:nbgitpuller
command execution on server start when server is started from projects table, so that user can continue working on where they left. We can do this by passing an option to spawner (I think this is very related to https://github.com/jupyterhub/binderhub/issues/712)delete
button in the actions of table which removes the repository from the table and deletes the folder of the repo in user's persistent volume. Right now we have the button in the actions column but it doesn't do anything.In short, "Your Projects" table is used to continue working on a repo (when you don't want o update the image or code base from remote).
Limitations and missing parts summary
nbgitpuller
must be installed in user images, right now we use appendix to ensure its installation (maybe it can be added intorepo2docker
defaults)Where to find helm config and custom templates
KubeSpawner
here: https://github.com/gesiscss/orc/blob/binderinjhubgh/jupyterhub/config_test.yaml#L170-L227home.html
is jupyterhub home page): https://github.com/gesiscss/orc/tree/binderinjhubgh/jupyterhub/docker/k8s_hub/templateshttps://notebooks-test.gesis.org/jupyter/ uses github authenticator and everybody is welcome to login and try it out (it is just a test instance and will be deleted again). We really would like to get your feedback about what we have done so far. Probably most important question is if we are on the right track to accomplish what we want. And finally we are aware that there are a lot to improve for user interface.