janpfeifer / gonb

GoNB, a Go Notebook Kernel for Jupyter
https://github.com/janpfeifer/gonb
MIT License
467 stars 32 forks source link

jupyterhub support #94

Closed maltegrosse closed 3 months ago

maltegrosse commented 3 months ago

Hello Jan,

great work, I try to get it running on jupyterhub, but seems its unable to start the prebuild container:

jupyter/scipy-notebook runs without issues.

Is it possible that the entrypoint for hub based images is different?

https://github.com/jupyter/docker-stacks/tree/main/images/base-notebook

upyterhub | [I 2024-03-12 04:34:54.074 JupyterHub app:3160] JupyterHub is now running at http://:8000
jupyterhub | [I 2024-03-12 04:35:03.546 JupyterHub log:186] 200 GET /hub/home (malte@157.xxx.xxx.xxx) 51.33ms
jupyterhub | [I 2024-03-12 04:35:12.859 JupyterHub log:186] 200 GET /hub/spawn/malte (malte@157.xxx.xxx.xxx)) 7.48ms
jupyterhub | [W 2024-03-12 04:35:17.659 JupyterHub dockerspawner:92] DemoFormSpawner.container_image is deprecated in DockerSpawner 0.9.*, use DemoFormSpawner.image instead
jupyterhub | [I 2024-03-12 04:35:17.695 JupyterHub provider:651] Creating oauth client jupyterhub-user-malte
jupyterhub | [I 2024-03-12 04:35:17.730 JupyterHub dockerspawner:988] Container 'jupyter-malte' is gone
jupyterhub | [I 2024-03-12 04:35:17.747 JupyterHub dockerspawner:1272] Created container jupyter-malte (id: d5fed29) from image janpfeifer/gonb_jupyterlab:v0.9.6
jupyterhub | [I 2024-03-12 04:35:17.747 JupyterHub dockerspawner:1296] Starting container jupyter-malte (id: d5fed29)
jupyterhub | [I 2024-03-12 04:35:18.662 JupyterHub log:186] 302 POST /hub/spawn/malte -> /hub/spawn-pending/malte (malte@157.110.40.166) 1006.74ms
jupyterhub | [I 2024-03-12 04:35:18.695 JupyterHub pages:394] malte is pending spawn
jupyterhub | [I 2024-03-12 04:35:18.696 JupyterHub log:186] 200 GET /hub/spawn-pending/malte (malte@157.xxx.xxx.xxx)) 3.58ms
jupyterhub | [I 2024-03-12 04:35:27.666 JupyterHub dockerspawner:988] Container 'jupyter-malte' is gone
jupyterhub | [W 2024-03-12 04:35:27.666 JupyterHub dockerspawner:963] Container not found: jupyter-malte
jupyterhub | Task exception was never retrieved
jupyterhub | future: <Task finished name='Task-48' coro=<BaseHandler.spawn_single_user() done, defined at /usr/local/lib/python3.10/dist-packages/jupyterhub/handlers/base.py:844> exception=HTTPError()>
jupyterhub | Traceback (most recent call last):
jupyterhub |   File "/usr/local/lib/python3.10/dist-packages/jupyterhub/handlers/base.py", line 1051, in spawn_single_user
jupyterhub |     await gen.with_timeout(
jupyterhub | asyncio.exceptions.TimeoutError: Timeout
jupyterhub | 
jupyterhub | During handling of the above exception, another exception occurred:
jupyterhub | 
jupyterhub | Traceback (most recent call last):
jupyterhub |   File "/usr/local/lib/python3.10/dist-packages/jupyterhub/handlers/base.py", line 1085, in spawn_single_user
jupyterhub |     raise web.HTTPError(
jupyterhub | tornado.web.HTTPError: HTTP 500: Internal Server Error (Spawner failed to start [status=0]. The logs for malte may contain details.)
jupyterhub | [I 2024-03-12 04:35:48.084 JupyterHub dockerspawner:988] Container 'jupyter-malte' is gone
jupyterhub | [W 2024-03-12 04:35:48.085 JupyterHub dockerspawner:963] Container not found: jupyter-malte
jupyterhub | [W 2024-03-12 04:36:01.731 JupyterHub user:881] malte's server never showed up at http://172.23.0.3:8888/user/malte/ after 30 seconds. Giving up.
jupyterhub |     
jupyterhub |     Common causes of this timeout, and debugging tips:
jupyterhub |     
jupyterhub |     1. The server didn't finish starting,
jupyterhub |        or it crashed due to a configuration issue.
jupyterhub |        Check the single-user server's logs for hints at what needs fixing.
jupyterhub |     2. The server started, but is not accessible at the specified URL.
jupyterhub |        This may be a configuration issue specific to your chosen Spawner.
jupyterhub |        Check the single-user server logs and resource to make sure the URL
jupyterhub |        is correct and accessible from the Hub.
jupyterhub |     3. (unlikely) Everything is working, but the server took too long to respond.
jupyterhub |        To fix: increase `Spawner.http_timeout` configuration
jupyterhub |        to a number of seconds that is enough for servers to become responsive.
jupyterhub |     
jupyterhub | [I 2024-03-12 04:36:01.737 JupyterHub dockerspawner:988] Container 'jupyter-malte' is gone
jupyterhub | [W 2024-03-12 04:36:01.737 JupyterHub dockerspawner:963] Container not found: jupyter-malte
jupyterhub | [E 2024-03-12 04:36:01.799 JupyterHub gen:630] Exception in Future <Task finished name='Task-49' coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /usr/local/lib/python3.10/dist-packages/jupyterhub/handlers/base.py:954> exception=TimeoutError("Server at http://172.23.0.3:8888/user/malte/ didn't respond in 30 seconds")> after timeout
jupyterhub |     Traceback (most recent call last):
jupyterhub |       File "/usr/local/lib/python3.10/dist-packages/tornado/gen.py", line 625, in error_callback
jupyterhub |         future.result()
jupyterhub |       File "/usr/local/lib/python3.10/dist-packages/jupyterhub/handlers/base.py", line 961, in finish_user_spawn
jupyterhub |         await spawn_future
jupyterhub |       File "/usr/local/lib/python3.10/dist-packages/jupyterhub/user.py", line 862, in spawn
jupyterhub |         await self._wait_up(spawner)
jupyterhub |       File "/usr/local/lib/python3.10/dist-packages/jupyterhub/user.py", line 906, in _wait_up
jupyterhub |         raise e
jupyterhub |       File "/usr/local/lib/python3.10/dist-packages/jupyterhub/user.py", line 876, in _wait_up
jupyterhub |         resp = await server.wait_up(
jupyterhub |       File "/usr/local/lib/python3.10/dist-packages/jupyterhub/utils.py", line 288, in wait_for_http_server
jupyterhub |         re = await exponential_backoff(
jupyterhub |       File "/usr/local/lib/python3.10/dist-packages/jupyterhub/utils.py", line 236, in exponential_backoff
jupyterhub |         raise asyncio.TimeoutError(fail_message)
jupyterhub |     asyncio.exceptions.TimeoutError: Server at http://172.23.0.3:8888/user/malte/ didn't respond in 30 seconds
janpfeifer commented 3 months ago

hi @maltegrosse , thanks for pointing it out.

I didn't even know about JupyterHub ... and it sounds interesting -- although I didn't understand exactly what it does from its description page. I assume is a multi-tenancy Jupyter server thing ?

Anyway, I tried to install it under a single user -- I replaced it to the DummyAuthenticator, because I was not sure what user/password it was asking for, and I won't trust it not to share my password -- but then I got the following error:

500 : Internal Server Error
Redirect loop detected. Notebook has jupyterhub version unknown (likely < 0.8), but the Hub expects 4.0.2. Try installing jupyterhub==4.0.2 in the user environment if you continue to have problems.

Checking with pip install I did install exactly versino 4.0.2, so the error seems wrong.

Notice I haven't even been able to run jupyter yet.

Would you have any idea how to get jupyterhub to work in a single developer mode so I can try to make GoNB to work ?

cheers

ps.: I'm sorry during the week I dont' have much time, I just quickly checked it, likely I'm missing something obvious ...

janpfeifer commented 3 months ago

From the command line logs:

[W 2024-03-12 17:28:29.641 JupyterHub base:1656] Redirect loop detected on /hub/user/janpf/?redirects=1
[I 2024-03-12 17:28:31.643 JupyterHub log:191] 302 GET /hub/user/janpf/?redirects=1 -> /user/janpf/?redirects=2 (janpf@127.0.0.1) 2006.25ms
[I 2024-03-12 17:28:31.647 JupyterHub log:191] 302 GET /user/janpf/?redirects=2 -> /hub/user/janpf/?redirects=2 (@127.0.0.1) 0.70ms
[W 2024-03-12 17:28:31.651 JupyterHub web:1873] 500 GET /hub/user/janpf/?redirects=2 (127.0.0.1): Redirect loop detected. Notebook has jupyterhub version unknown (likely < 0.8), but the Hub expects 4.0.2. Try installing jupyterhub==4.0.2 in the user environment if you continue to have problems.

Odd error message ... maybe they assume the version 4.0.2 has fixed some redirect loop bug ?

janpfeifer commented 3 months ago

Ugh, I found the issue: I was using port 8081 -- the one JupyterHub uses -- and I should instead be using port 8000, the one of the nodejs proxy thing. Their error reporting could be better ... it was very misleading.

Anyway ... once I opened it in the right port, everything worked, JupyterHub started a JupyterLab, which in turn started GoNB, no issues. Everything worked ...

image

So we know now it should work, and your issue must be some installation issue.

I don't know much (or anything) about the https://github.com/jupyter/docker-stacks/tree/main/images/base-notebook docker ... does it work with normal Python ?

Maybe you can change its Dockerfile to install GoNB, following what GoNB's Dockerfile do ?

Apologies, I'm trying to help in the dark. If you give me more details on what you are trying to do maybe I can help ?

maltegrosse commented 3 months ago

Dear Jan,

thank you for your help. Jupyterhub is a nice tool for multi-tenancy (comparable to colab?) with some feature like authentication and killing idling environments. I use it in a docker environment and a kubernetes environment so I can scale it up for hundreds of users easily.

I am abit confused of that port thing ;-)

As it seems both container using the same port 8888:

where did you change the ports?

janpfeifer commented 3 months ago

Apologies if I'm repeating something you already know, but the bits and pieces of that I gathered that may answer your question:

So it seems if you want a container that runs JupyterHub, you will need to also install the proxy, and expose port 8000.

I'm surprised that the Jupyter's Dockerfile you mentioned includes JupyterHub, but doesn't seem to install the proxy (see that installation instructions include installing of the proxy), but maybe it gets installed through some indirect dependency ?

Last thing is that when running the container, one wants to add the flags to expose the port again (I think). I usually run with a line like:

docker run -it --rm -p 8888:8888 -v "${PWD}":/home/jovyan/work janpfeifer/gonb_jupyterlab:latest

But I suppose for JupyterHub you will want to run replacing 8888 by 8000 above.

Just a few things to consider, I hope it helps you figure it out. Pls, let me know if you find out how to run it. I would love to create a sub-section into GoNB installation section with JupyterLab.

Btw, how are you handling authentication ? How do you create users in JupyterHub running inside the Docker container ?

maltegrosse commented 3 months ago

Thank you Jan.

just some general notes regarding jupyterhub: I am not using the notebook/lab container directly, I have a hub instance (including the proxy) directly running all time, if u want to simple test, thats my docker-compose: (k8s setup is way more complex as I use alot of customization like pvcs, postgres etc // I use traefik as a reverse proxy and to handle https certificates)

(adopted from official jupyterhub, could be outdated)

version: "3.5"

services:
  hub:
    build:
      context: .
      dockerfile: Dockerfile.jupyterhub
      args:
        JUPYTERHUB_VERSION: 3.0.0
    restart: always
    image: jupyterhub
    container_name: jupyterhub
    networks:
      - jupyterhub-network
      - web
    volumes:
      # The JupyterHub configuration file
      - "./jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py:ro"
      - "./js/admin-react.js:/usr/local/share/jupyterhub/static/js/admin-react.js:ro"
      # Bind Docker socket on the host so we can connect to the daemon from
      # within the container
      - "/var/run/docker.sock:/var/run/docker.sock:rw"
      # Bind Docker volume on host for JupyterHub database and cookie secrets
      - "./data:/data"

      # -rw-r--r-- 
    ports:
      - "8000:8000"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.hub.rule=Host(`hub.somedomain.com`)"
      - "traefik.http.routers.hub.entrypoints=websecure"
      - "traefik.http.routers.hub.tls.certResolver=le"
      - "traefik.docker.network=traefik_default"
      - "traefik.http.services.hub.loadbalancer.server.port=8000"
    environment:
      # This username will be a JupyterHub admin
      JUPYTERHUB_ADMIN: admin
      # All containers will join this network
      DOCKER_NETWORK_NAME: jupyterhub-network
      # JupyterHub will spawn this Notebook image for users
      DOCKER_NOTEBOOK_IMAGE: jupyter/minimal-notebook:latest
      # Notebook directory inside user image
      DOCKER_NOTEBOOK_DIR: /home/jovyan/work
      # Using this run command
      DOCKER_SPAWN_CMD: start-singleuser.sh

    command: >
      jupyterhub -f /srv/jupyterhub/jupyterhub_config.py

volumes:
  jupyterhub-data:
  jupyterhub-shared:

networks:
  jupyterhub-network:
    name: jupyterhub-network
  web:
    external:
      name: traefik_default 

including the hub config which simple refers to existing lab containers:

jupyterhub_config.py

# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.

# Configuration file for JupyterHub
import os
from dockerspawner import DockerSpawner
# from qhub_jupyterhub_theme import theme_extra_handlers, theme_template_paths

c = get_config()

# We rely on environment variables to configure JupyterHub so that we
# avoid having to rebuild the JupyterHub container every time we change a
# configuration parameter.

# Spawn single-user servers as Docker containers
c.JupyterHub.spawner_class = "dockerspawner.DockerSpawner"

# Spawn containers from this image
c.DockerSpawner.image = os.environ["DOCKER_NOTEBOOK_IMAGE"]

# JupyterHub requires a single-user instance of the Notebook server, so we
# default to using the `start-singleuser.sh` script included in the
# jupyter/docker-stacks *-notebook images as the Docker run command when
# spawning containers.  Optionally, you can override the Docker run command
# using the DOCKER_SPAWN_CMD environment variable.
spawn_cmd = os.environ.get("DOCKER_SPAWN_CMD", "start-singleuser.sh")
c.DockerSpawner.cmd = spawn_cmd

# Connect containers to this Docker network
network_name = os.environ["DOCKER_NETWORK_NAME"]
c.DockerSpawner.use_internal_ip = True
c.DockerSpawner.network_name = network_name

# Explicitly set notebook directory because we'll be mounting a volume to it.
# Most jupyter/docker-stacks *-notebook images run the Notebook server as
# user `jovyan`, and set the notebook directory to `/home/jovyan/work`.
# We follow the same convention.
notebook_dir = os.environ.get("DOCKER_NOTEBOOK_DIR") or "/home/jovyan/work"
c.DockerSpawner.notebook_dir = notebook_dir
c.DockerSpawner.mem_limit = None
c.Spawner.mem_limit = None
c.DockerSpawner.volumes = {
          'jupyterhub-user-{username}': '/home/jovyan/work',
         'jupyterhub-shared': '/home/jovyan/work/shared',

}
#c.DockerSpawner.extra_create_kwargs = {'runtime': 'nvidia'}
#c.DockerSpawner.extra_host_config = {'runtime': 'nvidia'}
import docker
c.DockerSpawner.extra_host_config = {
    "device_requests": [
        docker.types.DeviceRequest(
            count=-1,
            capabilities=[["gpu"]],
        ),
    ],
}
# Mount the real user's Docker volume on the host to the notebook user's
# notebook directory in the container
#c.DockerSpawner.volumes = {"jupyterhub-user-{username}": notebook_dir}

# Remove containers once they are stopped
c.DockerSpawner.remove = True
c.DockerSpawner.default_url = '/lab'
# For debugging arguments passed to spawned containers
c.DockerSpawner.debug = True

# User containers will access hub by container name on the Docker network
c.JupyterHub.hub_ip = "jupyterhub"
c.JupyterHub.hub_port = 8080

# Persist hub data on volume mounted inside container
c.JupyterHub.cookie_secret_file = "/data/jupyterhub_cookie_secret"
c.JupyterHub.db_url = "sqlite:////data/jupyterhub.sqlite"

# Authenticate users with Native Authenticator
#c.JupyterHub.authenticator_class = "nativeauthenticator.NativeAuthenticator"

# Allow anyone to sign-up without approval
#c.NativeAuthenticator.open_signup = True

#custom template
#c.JupyterHub.template_paths=['/usr/local/share/jupyterhub/']
#c.JupyterHub.template_paths = ['/etc/jupyterhub/custom/templates']

# Allowed admins
admin = os.environ.get("JUPYTERHUB_ADMIN")
if admin:
    c.Authenticator.admin_users = [admin]
c.Authenticator.admin_users = {'me'}
from oauthenticator.generic import GenericOAuthenticator
c.JupyterHub.authenticator_class = GenericOAuthenticator
c.JupyterHub.authenticator_class = 'generic-oauth'
c.OAuthenticator.login_service = "auth"
c.OAuthenticator.username_key = "preferred_username"

c.OAuthenticator.client_id = "jh"
c.OAuthenticator.client_secret = "somesecret"
c.OAuthenticator.oauth_callback_url =   "https://hub.somedomain.com/hub/oauth_callback"
c.OAuthenticator.authorize_url =        "https://auth.somedomain.com/realms/lab/protocol/openid-connect/auth"
c.OAuthenticator.token_url =            "https://auth.somedomain.com/realms/lab/protocol/openid-connect/token"
c.OAuthenticator.userdata_url =         "https://auth.somedomain.com/realms/lab/protocol/openid-connect/userinfo"
c.OAuthenticator.userdata_params = {"state": "state"}
c.OAuthenticator.tls_verify = False

class DemoFormSpawner(DockerSpawner):
    def _options_form_default(self):
        default_stack = "jupyter/minimal-notebook"
        return """
        <label for="stack">Select your desired stack</label>
        <select name="stack" size="1">
        <option value="jupyter/scipy-notebook">SciPy</option>

        </select>
        """.format(stack=default_stack)

    def options_from_form(self, formdata):
        options = {}
        options['stack'] = formdata['stack']
        container_image = ''.join(formdata['stack'])
        print("SPAWN: " + container_image + " IMAGE" )
        self.container_image = container_image
        return options
c.Spawner.http_timeout = int(60)
c.JupyterHub.spawner_class = DemoFormSpawner

the <option value="jupyter/scipy-notebook">SciPy</option> can simple be extended by adding another line refering to your container. There I normally just add other environments like R, C, or GPU supported environments. By adding your container, it fails as mentioned above.

As far as I understood, the docker stack of jupyterhub got all the required components available (like proxy etc), so you actually dont have to take care about them.

As you see above, I am using an oauth authenticator, in my case a keycloak instance, which handles the user management. Only some anual cleanup needs to be done, which can be done simple by the hub admin panel (delete all people who havent signed in since...)

janpfeifer commented 3 months ago

hi Malte, thanks for the explanation, now I have a better appreciation of what is going on.

I'm not sure I'm able to figure out what is going wrong, but if you have any suggestions of changes to GoNB's Dockerfile that would fix it, I would be happy to apply the fix.

One way of making this work might be to take a Dockerfile with JupyterLab that works for your JupyterHub, and just include the GoNB installation lines to it:

#######################################################################################################
# Go and GoNB Libraries
#######################################################################################################
ENV GO_VERSION=1.22.0
ENV GONB_VERSION="v0.9.6"
ENV GOROOT=/usr/local/go
ENV GOPATH=/opt/go
ENV PATH=$PATH:$GOROOT/bin:$GOPATH/bin

# Create Go directory for user -- that will not move if the user home directory is moved.
USER root
RUN mkdir ${GOPATH} && chown ${NB_USER}:users ${GOPATH}

USER root
WORKDIR /usr/local
RUN wget --quiet --output-document=- "https://go.dev/dl/go${GO_VERSION}.linux-amd64.tar.gz" | tar -xz \
    && go version

# Install GoNB (https://github.com/janpfeifer/gonb) in the user account
USER $NB_USER
WORKDIR ${HOME}
RUN export GOPROXY=direct && \
    go install "github.com/janpfeifer/gonb@${GONB_VERSION}" && \
    go install golang.org/x/tools/cmd/goimports@latest && \
    go install golang.org/x/tools/gopls@latest && \
    gonb --install

Would this help ?

cheers

maltegrosse commented 3 months ago

Hey Jan,

that was easy, I just did

ARG OWNER=jupyter
ARG BASE_CONTAINER=$OWNER/minimal-notebook
FROM $BASE_CONTAINER
#######################################################################################################
# Go and GoNB Libraries
#######################################################################################################
ENV GO_VERSION=1.22.0
ENV GONB_VERSION="v0.9.6"
ENV GOROOT=/usr/local/go
ENV GOPATH=/opt/go
ENV PATH=$PATH:$GOROOT/bin:$GOPATH/bin

# Create Go directory for user -- that will not move if the user home directory is moved.
USER root
RUN mkdir ${GOPATH} && chown ${NB_USER}:users ${GOPATH}

USER root
WORKDIR /usr/local
RUN wget --quiet --output-document=- "https://go.dev/dl/go${GO_VERSION}.linux-amd64.tar.gz" | tar -xz \
    && go version

# Install GoNB (https://github.com/janpfeifer/gonb) in the user account
USER $NB_USER
WORKDIR ${HOME}
RUN export GOPROXY=direct && \
    go install "github.com/janpfeifer/gonb@${GONB_VERSION}" && \
    go install golang.org/x/tools/cmd/goimports@latest && \
    go install golang.org/x/tools/gopls@latest && \
    gonb --install

and it works without issues... I tested the basic examples from your tutorial.ipynb (first few) and it works.

thank you alot for your help.

Dunno if others are interested in that solution, if so, you can simple change your dockerfile. ( remember its based on minimal-notebook , so abit bigger than the base image --> 4.1GB vs 2.51GB)

Screenshot 2024-03-14 at 11 34 58

P.S. feel free to close this issue - I am fine now - thank you again!