Multi-user hub which spawns, manages, and proxies multiple workspace instances.
Highlights • Getting Started • Features & Screenshots • Support • Report a Bug • Contribution
MLHub is based on JupyterHub with complete focus on Docker and Kubernetes. MLHub allows to create and manage multiple workspaces, for example to distribute them to a group of people or within a team. The standard configuration allows a setup within seconds.
Most parts will be identical to the configuration of JupyterHub 1.0.0. One of the things done differently is that ssl will not be activated on proxy or hub-level, but on our nginx proxy.
Following commands will start the hub with the default config.
docker run \
-p 8080 \
-v /var/run/docker.sock:/var/run/docker.sock \
-v jupyterhub_data:/data \
mltooling/ml-hub:latest
To persist the hub data, such as started workspaces and created users, mount a directory to /data
.
Any given name (--name
) will be overruled by the environment variable HUB_NAME
.
Via Helm:
RELEASE=mlhub # change if needed
NAMESPACE=$RELEASE # change if needed
helm upgrade --install $RELEASE mlhub-chart-2.0.0.tgz --namespace $NAMESPACE
# In case you just want to use the templating mechanism of Helm without deploying tiller on your cluster
# 1. Use the "helm template ..." command. The template command also excepts flags such as --config and --set-file as described in the respective Sections in this documentation.
# 2. kubectl apply -f templates/hub && kubectl apply -f templates/proxy
You can find the chart file attached to the release.
When using the default config - so leaving the JupyterHub config c.Authenticator.admin_users
as it is -, a user named admin
can access the hub with admin rights. If you use the default NativeAuthenticator
as authenticator, you must register the user admin
with a password of your choice first before login in.
If you use a different authenticator, you might want to set a different user as initial admin user as well, for example in case of using oauth you want to set c.Authenticator.admin_users
to a username returned by the oauth login.
MLHub is based on SSH Proxy. Check out SSH Proxy for ssh-related configurations. Check the Configuration Section for details how to pass them, especially in the Kubernetes setup. Here are the additional environment variables for the hub:
Variable | Description | Default |
---|---|---|
HUB_NAME | In Docker-local mode, the container will be (re-)named based on the value of this environment variable. All resources created by the hub will take this name into account. Hence, you can have multiple hub instances running without any naming conflicts. Further, we let the workspace containers connect to the hub not via its docker id but its docker name. This way, the workspaces can still connect to the hub in case it was deleted and re-created (for example when the hub was updated). The value must be DNS compliant and must be between 1 and 5 characters long. | mlhub |
SSL_ENABLED | Enable SSL. If you don't provide an ssl certificate as described in Section "Enable SSL/HTTPS", certificates will be generated automatically. As this auto-generated certificate is not signed, you have to trust it in the browser. Without ssl enabled, ssh access won't work as the container uses a single port and has to tell https and ssh traffic apart. | false |
EXECUTION_MODE | Defines in which execution mode the hub is running in. Value is one of [local | k8s] | local (if you use the helm chart, the value is already set to k8s) |
DYNAMIC_WHITELIST_ENABLED | Enables each Authenticator to use a file as a whitelist of usernames. The file must contain one whitelisted username per line and must be mounted to /resources/users/dynamic_whitelist.txt. The file can be dynamically modified. The c.Authenticator.whitelist configuration is not considered! If set to true but the file does not exist,the normal whitelist behavior of JupyterHub is used. Keep in mind that already logged in users stay authenticated even if removed from the list - they just cannot login again. | false |
CLEANUP_INTERVAL_SECONDS | Interval in which expired and not-used resources are deleted. Set to -1 to disable the automatic cleanup. For more information, see Section Cleanup Service. | 3600 (currently disabled in Kubernetes) |
JupyterHub and the used Spawner are configured via a config.py
file as stated in the official documentation. In case of MLHub, a default config file is stored under /resources/jupyterhub_config.py
. If you want to override settings or set extra ones, you can put another config file under /resources/jupyterhub_user_config.py
.
Important: When setting properties for the Spawner, please use the general form c.Spawner.
instead of c.DockerSpawner.
, c.KubeSpawner.
etc. so that they are merged with default values accordingly.
Our custom Spawners support the additional configurations:
c.Spawner.workspace_images
- set the images that appear in the dropdown menu when a new named server should be created, e.g. c.Spawner.workspace_images = [c.Spawner.image, "mltooling/ml-workspace-gpu:0.8.7", "mltooling/ml-workspace-r:0.8.7"]
Following settings should probably not be overriden:
c.Spawner.prefix
and c.Spawner.name_template
- if you change those, check whether your SSH environment variables permit those names a target. Also, think about setting c.Authenticator.username_pattern
to prevent a user having a username that is also a valid container name.An examplary custom config file could look like this:
# jupyterhub_user_config.py
c.Spawner.environment = {"FOO": "BAR"}
c.Spawner.workspace_images = ["mltooling/ml-workspace-r:0.8.7"]
Docker-local
In Docker, mount a custom config like -v /jupyterhub_user_config:/resources/jupyterhub_user_config.py
. Have a look at the DockerSpawner properties to see what can be configured.
Kubernetes
When using Helm, you can pass the configuration to the installation command via --set-file userConfig=./jupyterhub_user_config.py
. So the complete command could look like helm upgrade --install mlhub mlhub-chart-1.0.1.tgz --namespace mlhub --set-file userConfig=./jupyterhub_user_config.py
. Have a look at the KubeSpawner properties to see what can be configured for the Spawner.
Additionally to the jupyterhub_user_config.py
, which can be used to configure JupyterHub or the KubeSpawner, you can provide a config.yaml
where you can make some Kubernetes-deployment specific configurations. Check out the helmchart/ directory for more information.
You can think of it like this: everything that has to be configured for the deployment itself, such as environment variables or volumes for the hub / proxy itself, goes to the config.yaml
. Everything related to JupyterHub's way of working such as how to authenticate or what the spawned user pods will mount goes to the jupyterhub_user_config.py
.
ℹ️ _Some JupyterHub configurations cannot be set in the
jupyterhub_user_config.py
as they have to be shared between services and, thus, have to be known during deployment. Instead, if you want to specify them, you have to do it in theconfig.yaml
(see below)._
You can pass the file via --values config.yaml
. The complete command would look like helm upgrade --install mlhub mlhub-chart-1.0.1.tgz --namespace mlhub --values config.yaml
. The --set-file userConfig=./jupyterhub_user_config.py
flag can additionally be set.
You can find the Helm chart resources, including the values file that contains the default values, in the directory helmchart
).
MLHub will start in HTTP mode by default. Note that in HTTP mode, the ssh tunnel feature does not work.
You can activate ssl via the environment variable SSL_ENABLED
. If you don't provide a certificate, it will generate one during startup. This is to make routing SSH connections possible as we use nginx to handle HTTPS & SSH on the same port.
We override DockerSpawner and KubeSpawner for Docker and Kubernetes, respectively. We do so to add convenient labels and environment variables. Further, we return a custom option form to configure the resouces of the workspaces. The overriden Spawners can be configured the same way as the base Spawners as stated in the Configuration Section.
All resources created by our custom spawners are labeled (Docker / Kubernetes labels) with the labels mlhub.origin
set to the Hub name $ENV_HUB_NAME
, mlhub.user
set to the JupyterHub user the resources belongs to, and mlhub.server_name
to the named server name. For example, if the hub name is "mlhub" and a user named "foo" has a named server "bar", the labels would be mlhub.origin=mlhub
, mlhub.user=foo
, mlhub.server_name=bar
.
The ML Hub project is maintained by @raethlein and @LukasMasuch. Please understand that we won't be able to provide individual support via email. We also believe that help is much more valuable if it's shared publicly so that more people can benefit from it.
Type | Channel |
---|---|
🚨 Bug Reports | |
🎁 Feature Requests | |
👩💻 Usage Questions | |
🗯 General Discussion |
We have the three following scenarios in mind for the hub and want to point them out as a guideline. These three scenarios are thought of as an inspiration and are based on the default configuration by using native-authenticator as the hub authenticator. If you start the hub with a different authenticator or change other settings, you might want to or have to do things differently.
In this scenario, the idea is that just the admin user exists and can access the hub. The admin user then creates workspaces and distributes them to users.
Go to the admin panel (1) and create a new user (2).
You can then start the standard workspace for that user or create a new workspace (see second image).
Via the ssh access button (3), you can send the user a command to connect to the started workspace via ssh. For more information about the ssh-feature in the workspace, checkout this documentation section. If you created a workspace for another user, it might be necessary to click access on the workspace and authorize once per user to be able to use the ssh-access button.
A user can also access the UI via ssh-ing into the workspace, printing the API token via echo $JUPYTERHUB_API_TOKEN
, and then accessing the url of the hub in the browser under /user/<username>/<workspace-name>/tree?token=<jupyterhub-api-token>
. The JUPYTERHUB_API_TOKEN
gives access to all named servers of a user, so use different users for different persons in this scenario.
ℹ️ _Do not create different workspaces for the same Hub user and then give access to them to different persons. Via the
$JUPYTERHUB_API_TOKEN
you get access to all workspaces of a user. In other words, if you create multiple named workspaces for the user 'admin' and distribute it to different persons, they can access all named workspaces for the 'admin' user._
Give also non-admin users the permission to create named workspaces.
To give users access, the admin just has to authorize registered users.
Users can login and get a default workspace. No additional workspaces can be created.
To let users login and get a default workspace but not let them create new servers, just set the config option c.JupyterHub.allow_named_servers
to False
when starting the hub. Note that this also disables the ability for starting named servers for the admin. Currently, the workaround would be to have a second hub container just for the admin.
When named servers are allowed and the hub is started with the default config, you can create named servers. When doing so, you can set some configurations for the new workspace, such as resource limitations or mounting GPUs. Mounting GPUs is not possible in Kuberntes mode currently. The "Days to live" flag is purely informational currently and can be seen in the admin view; it should help admins to keep an overview of workspaces.
JupyterHub was originally not created with Docker or Kubernetes in mind, which can result in unfavorable scenarios such as that containers are stopped but not deleted on the host. Furthermore, our custom spawners might create some artifacts that should be cleaned up as well. MLHub contains a cleanup service that is started as a JupyterHub service inside the hub container; both in the Docker and the Kubernetes setup. It can be accessed as a REST-API by an admin, but it is also triggered automatically every X timesteps when not disabled (see config for CLEANUP_INTERVAL_SECONDS
). The service enhances the JupyterHub functionality with regards to the Docker and Kubernetes world. "Containers" is hereby used interchangeably for Docker containers and Kubernetes pods.
The service has two endpoints which can be reached under the Hub service url /services/cleanup-service/*
with admin permissions.
GET /services/cleanup-service/users
: This endpoint is currently doing anything only in Docker-local mode. There, it will check for resources of deleted users, so users who are not in the JupyterHub database anymore, and delete them. This includes containers, networks, and volumes. This is done by looking for labeled Docker resources that point to containers started by hub and belonging to the specific users.
GET /services/cleanup-service/expired
: When starting a named workspace, an expiration date can be assigned to it. This endpoint will delete all containers that are expired. The respective named server is deleted from the JupyterHub database and also the Docker/Kubernetes resource is deleted.
Variable | Description | Default |
---|---|---|
START_SSH | Start the sshd process which is used to tunnel ssh to the workspaces. | true |
START_NGINX | Whether or not to start the nginx proxy. If the Hub should be used without additional tool routing to workspaces, this could be disabled. SSH port 22 would need to be published separately then. This option is built-in to work with our Kubernetes Helm chart. | true |
START_JHUB | Start the JupyterHub hub. | true |
START_CHP | Start the JupyterHub proxy process separately (The hub should not start the proxy itself, which can be configured via the JupyterHub config file. This option is built-in to work with our Kubernetes Helm chart, where the image is also used as the Configurable-Http-Proxy (CHP) image. Additional arguments to the chp-start command can be passed to the container by passing an environment variable ADDITIONAL_ARGS, e.g. --env ADDITIONAL_ARGS="--ip=0.0.0.0 --api-ip=0.0.0.0". | false |
CONTRIBUTING.md
and check out help-wanted issues.Licensed Apache 2.0. Created and maintained with ❤️ by developers from SAP in Berlin.