rapidsai / deployment

RAPIDS Deployment Documentation
https://docs.rapids.ai/deployment/stable/
9 stars 31 forks source link

Document using RAPIDS on JupyterHub #176

Open jacobtomlinson opened 1 year ago

jacobtomlinson commented 1 year ago

A question came up on the GoAI Slack about how to use RAPIDS with JupyterHub. The user was following the environment customisation guide and running into issues with the RAPIDS container image.

We should run through the steps of using the RAPIDS container on JupyterHub ourselves and figure out what is required to get things working. I expect we will need to configure the DISABLE_JUPYTER=true and EXTRA_PIP_PACKAGES=jupyterhub-singleuser environment variables.

Hopefully this is all we need to do but we should verify this and then create a documentation page on how to do it.

mroeschke commented 1 year ago

I tried using our rapidsai/rapidsai image for this JupyterHub Kubernetes workflow, but I don't think it's working because as I think the guide assumes the user is extending one of their existing jupyterhub/docker-stack images: https://z2jh.jupyter.org/en/stable/jupyterhub/customizing/user-environment.html#customize-an-existing-docker-image

Would it be sufficient to recommend the users to extend one of these images and conda install rapids instead?

jacobtomlinson commented 1 year ago

I don't think it's working because as I think the guide assumes the user is extending one of their existing

You can use your own images. You just need to ensure it meets the assumptions (people should codify and test these assumptions with container canary). AFAIK you should just need jupyterhub jupyterlab and jupyter-singleuser installed in the container.

Would it be sufficient to recommend the users to extend one of these images and conda install rapids instead?

If it's non trivial to get things working with our container then this is probably an acceptable workaround.

mroeschke commented 1 year ago

So I think I'm getting close:

I created a cluster on GKE and installed Zero-to-Jupyterhub helm chart

gcloud container clusters create rapids-jupyterlab-test --accelerator type=nvidia-tesla-t4 --machine-type=n1-standard-4 --num-nodes 2 --zone us-west2-b --cluster-version latest
kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=<MY_EMAIL>
helm upgrade --cleanup-on-fail --install 2.0.0 jupyterhub/jupyterhub --namespace default --create-namespace --version=2.0.0 --values config.yaml

with this config.yaml

singleuser:
  extraEnv:
    EXTRA_CONDA_PACKAGES: "jupyterhub-singleuser, jupyterhub, jupyterlab"
  image:
    name: rapidsai/rapidsai
    tag: 23.02-cuda11.8-runtime-ubuntu22.04-py3.10

When the user pod starts up on login, it appears the it tries to use the cache from the notices directory? I'm not sure if that's expected

$ kubectl logs jupyter-test-5fname
Defaulted container "notebook" out of: notebook, block-cloud-metadata (init)
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.download.nvidia.com/licenses/NVIDIA_Deep_Learning_Container_License.pdf

EXTRA_CONDA_PACKAGES environment variable found. Installing packages.

CondaError: Error encountered while attempting to create cache directory.
  Directory: /.cache/conda/notices
  Exception: [Errno 13] Permission denied: '/.cache'

EDIT:

So I got a bit farther with this config.yaml

hub:
  extraConfig:
    myConfig.py: |
      c.DummyAuthenticator.password = "test"
      c.KubeSpawner.http_timeout = 500
singleuser:
  uid: 0
  #startTimeout: 300
  extraEnv:
    EXTRA_CONDA_PACKAGES: "jupyterhub-singleuser, jsonschema-with-format-nongpl, webcolors"
  image:
    name: rapidsai/rapidsai
    tag: 23.02-cuda11.8-runtime-ubuntu22.04-py3.10

(extra packages needed due to https://stackoverflow.com/questions/75511394/jupyter-contrib-nbextension-install-user-pkg-resources-distributionnotfound-we)

Leads to a login screen to jupyterhub, but I am having trouble successfully logging into with a dummy account.

betatim commented 1 year ago

To help with the login screen problem: do you see any error messages in the logs of the hub pod? With a brand new JupyterHub on a new kubernetes cluster there will be a bunch of pods but the name of one of them should start with hub-. This is the pod running JupyterHub, which should (hopefully) print some useful error message when you try to sign in.

I don't know what kind of use-case/user we are targeting with this guide, so my following comments are maybe useless:

In general it makes sense to pre-build "single user image" for a JupyterHub. In our case that would mean building a container image based on rapidsai/rapidsai:23.02-cuda11.8-runtime-ubuntu22.04-py3.10 from a custom Dockerfile. That way we can install the extra packages required and make other adjustments that might need to happen. That way we don't need to run the user pod as root (singleuser.uid: 0) and startup times would be quicker because no extra stuff needs installing each time a user signs in. A downside is that you need a public docker registry you can push the built image to, which makes the guide a bit more complex.

jacobtomlinson commented 1 year ago

A downside is that you need a public docker registry you can push the built image to, which makes the guide a bit more complex.

If we can avoid this it would be good. Any instructions which start with "Build a container and push it somewhere" adds a lot of friction.

It would be interesting to see how much size the dependencies add, maybe we can just add them to the RAPIDS container if they are small. Also if they are small installing them at runtime isn't insane and gives the most amount of flexibility.