pangeo-data / pangeo-eosc

Pangeo for the European Open Science cloud
https://pangeo-data.github.io/pangeo-eosc/
MIT License
3 stars 3 forks source link

Trying to reconfigure https://pangeo-foss4g.vm.fedcloud.eu/jupyterhub #5

Closed guillaumeeb closed 1 year ago

guillaumeeb commented 1 year ago

I tried hard tonight to find a working setup on pangeo-foss4g platform, but with no luck.

It seems that Jupyterlab call to Jupyterhub services/dask-gateway is always failing. I did not figured out why.

See errors:

[I 2022-08-16 21:12:35.717 JupyterHub log:189] 302 GET /services/dask-gateway/api/v1/clusters/ -> /jupyterhub/hub/services/dask-gateway/api/v1/clusters/ (@::ffff:10.244.1.215) 0.80ms
[W 2022-08-16 21:12:35.722 JupyterHub log:189] 404 GET /jupyterhub/hub/services/dask-gateway/api/v1/clusters/ (@::ffff:10.244.1.215) 1.39ms

The route to dask-gateway looks good (from what I see in the hub pod logs), so I'm not sure from what it comes. I tried several variations of dask-gateway and jupyterhub configurations.

Here the one I'm at currently:

dask-gateway:
  enabled: true
  gateway:
    prefix: /jupyterhub/services/dask-gateway
          #prefix: /
    auth:
      type: jupyterhub
      jupyterhub:
        apiToken: "token1"
    extraConfig:
      optionHandler: |
        from dask_gateway_server.options import Options, Integer, Float, String

        def options_handler(options):
          if ":" not in options.image:
            raise ValueError("When specifying an image you must also provide a tag")
          return {
            "worker_cores": options.worker_cores,
            "worker_memory": int(options.worker_memory * 2 ** 30),
            "image": options.image,
          }

        c.Backend.cluster_options = Options(
          Integer("worker_cores", default=1, min=1, max=4, label="Worker Cores"),
          Float("worker_memory", default=1, min=1, max=8, label="Worker Memory (GiB)"),
          String("image", default="pangeo/pangeo-notebook:latest", label="Image"),
          handler=options_handler,
        )
  traefik:
    loglevel: DEBUG
dask-kubernetes:
  enabled: false
jupyterhub:
  hub:
    baseUrl: /jupyterhub/
    services:
      dask-gateway:
        apiToken: "token1"
        #url: http://api-daskhub-dask-gateway:8000
    config:
      GenericOAuthenticator:
        allowed_groups:
        - urn:mace:egi.eu:group:vo.pangeo.eu:role=member#aai.egi.eu
        authorize_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/auth
        claim_groups_key: eduperson_entitlement
        client_id: id
        client_secret: secret
        login_service: EGI Check-In
        oauth_callback_url: https://pangeo-foss4g.vm.fedcloud.eu/jupyterhub/hub/oauth_callback
        scope:
        - openid
        - email
        - profile
        - eduperson_entitlement
        token_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/token
        userdata_params:
          state: state
        userdata_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/userinfo
        username_key: preferred_username
      JupyterHub:
        authenticator_class: generic-oauth
  ingress:
    annotations:
      kubernetes.io/ingress.class: nginx
      cert-manager.io/cluster-issuer: "letsencrypt-prod"
    enabled: true
    tls:
    - hosts:
      - pangeo-foss4g.vm.fedcloud.eu
      secretName: pangeo-foss4g.vm.fedcloud.eu

  proxy:
    secretToken: "token2"
    service:
      type: ClusterIP
  singleuser:
    cpu:
      guarantee: 1
      limit: 2
    image:
      name: pangeo/pangeo-notebook
      tag: latest
    memory:
      guarantee: 4G
      limit: 8G
    extraEnv:
      DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: '{JUPYTER_IMAGE_SPEC}'

Key points I've been playing with:

So the next step would be to delete all the binderhub helm config in this platform, and try to deploy jupyterhub at the root of the DNS name, I guess, I see no other thing to try currently.

Finally, I really don't know how dask-gateway was able to work before, but this is certainly not due to the last helm chart values where it was deactivated.

With the above setup, using dask-gateway should be as simple as calling:

from dask_gateway import Gateway
gateway = Gateway(auth="jupyterhub")

cc @j34ni @sebastian-luna-valero for any thought on this.

Also pinging @jacobtomlinson and @consideRatio for external help if they have time for a quick glance, even if they really don't know the context here...

guillaumeeb commented 1 year ago

In the meantime, I put back the configuration with BasicAuth and password. We can use dask-gateway, but with no dashboard and a weak authentication.

The new configuration is still a bit better: Client, Scheduler and Workers are using the same image (pangeo/pangeo-notebook:latest), so we don't need to install extra packages on the workers.

guillaumeeb commented 1 year ago

And I also reached the end of the notebook dask_introduction with this setup, but maybe it was already OK before.

guillaumeeb commented 1 year ago

One thing that comes to my mind too, since our hub is on /jupyterhub/ prefix, maybe we also need to modify default daskhub values here and there.

consideRatio commented 1 year ago

One things that comes to my mind too, since our hub is on /jupyterhub/ prefix, maybe we also need to modify default daskhub values here and there.

Wiee yes this!

guillaumeeb commented 1 year ago

Wow, thanks @consideRatio for chiming in so quickly! I'll try a bit for an hour an report back.

guillaumeeb commented 1 year ago

That was close, definitely an improvement, but there seems to be a problem during auth:

[I 2022-08-17 06:43:49.038 JupyterHub log:189] 302 GET /hub/api/authorizations/token/[secret] -> /jupyterhub/hub/hub/api/authorizations/token/[secret] (dask-gateway@10.244.1.2) 22.28ms
[E 2022-08-17 06:43:49.152 JupyterHub web:1219] Uncaught exception in write_error
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1217, in send_error
        self.write_error(status_code, **kwargs)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/handlers/base.py", line 1283, in write_error
        html = self.render_template('%s.html' % status_code, sync=True, **ns)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/handlers/base.py", line 1202, in render_template
        return template.render(**template_ns)
      File "/usr/local/lib/python3.8/dist-packages/jinja2/environment.py", line 1304, in render
        self.environment.handle_exception()
      File "/usr/local/lib/python3.8/dist-packages/jinja2/environment.py", line 925, in handle_exception
        raise rewrite_traceback_stack(source=source)
      File "/usr/local/share/jupyterhub/templates/404.html", line 1, in top-level template code
        {% extends "error.html" %}
      File "/usr/local/share/jupyterhub/templates/error.html", line 1, in top-level template code
        {% extends "page.html" %}
      File "/usr/local/share/jupyterhub/templates/page.html", line 78, in top-level template code
        {% if not no_spawner_check and user and user.spawner.options_form %}
      File "/usr/local/lib/python3.8/dist-packages/jinja2/environment.py", line 474, in getattr
        return getattr(obj, attribute)
    jinja2.exceptions.UndefinedError: 'jupyterhub.orm.Service object' has no attribute 'spawner'

[W 2022-08-17 06:43:49.154 JupyterHub log:189] 404 GET /jupyterhub/hub/hub/api/authorizations/token/[secret] (dask-gateway@10.244.1.2) 114.95ms

Dask-gateway is having trouble verifying the token? I notice a double hub keyword in the URL above, is that normal? I probably have to configure the Jupyterhub API server URL? gateway.auth.jupyterhub.apiUrl?

Before that, the dask-gateway request reaches the api-gateway-server, that's good!

I'll revert back for now as I have to go.

sebastian-luna-valero commented 1 year ago

Hi,

This the configuration I tested, skipping TLS and Check-In configuration for the time being:

dask-gateway:
  enabled: true
  gateway:
    auth:
      jupyterhub:
        apiToken: token1
      type: jupyterhub
    prefix: /services/dask-gateway
    backend:  
      worker:
        cores:
          limit: 2
        memory:
          limit: 8G
        threads: 2
  traefik:
    service:
      type: ClusterIP
dask-kubernetes:
  enabled: false
jupyterhub:
  hub:
    config:
      Authenticator:
        admin_users:
        - admin
      JupyterHub:
        admin_access: true
        authenticator_class: nativeauthenticator.NativeAuthenticator
    extraConfig:
      00-add-dask-gateway-values: |
        # 1. Sets `DASK_GATEWAY__PROXY_ADDRESS` in the singleuser environment.
        # 2. Adds the URL for the Dask Gateway JupyterHub service.
        import os
        # These are set by jupyterhub.
        release_name = os.environ['HELM_RELEASE_NAME']
        release_namespace = os.environ['POD_NAMESPACE']
        if 'PROXY_HTTP_SERVICE_HOST' in os.environ:
            # https is enabled, we want to use the internal http service.
            gateway_address = 'http://{}:{}/services/dask-gateway/'.format(
                os.environ['PROXY_HTTP_SERVICE_HOST'],
                os.environ['PROXY_HTTP_SERVICE_PORT'],
            )
            print('Setting DASK_GATEWAY__ADDRESS {} from HTTP service'.format(gateway_address))
        else:
            gateway_address = 'http://proxy-public/services/dask-gateway'
            print('Setting DASK_GATEWAY__ADDRESS {}'.format(gateway_address))
        # Internal address to connect to the Dask Gateway.
        c.KubeSpawner.environment.setdefault('DASK_GATEWAY__ADDRESS', gateway_address)
        # Internal address for the Dask Gateway proxy.
        c.KubeSpawner.environment.setdefault('DASK_GATEWAY__PROXY_ADDRESS', 'gateway://traefik-{}-dask-gateway.{}:80'.format(release_name, release_namespace))
        # Relative address for the dashboard link.
        c.KubeSpawner.environment.setdefault('DASK_GATEWAY__PUBLIC_ADDRESS', '/services/dask-gateway/')
        # Use JupyterHub to authenticate with Dask Gateway.
        c.KubeSpawner.environment.setdefault('DASK_GATEWAY__AUTH__TYPE', 'jupyterhub')
        # Adds Dask Gateway as a JupyterHub service to make the gateway available at
        # {HUB_URL}/services/dask-gateway
        service_url = 'http://traefik-{}-dask-gateway.{}'.format(release_name, release_namespace)
        for service in c.JupyterHub.services:
            if service['name'] == 'dask-gateway':
                if not service.get('url', None):
                    print('Adding dask-gateway service URL')
                    service.setdefault('url', service_url)
                break
        else:
            print('dask-gateway service not found. Did you set jupyterhub.hub.services.dask-gateway.apiToken?')
    nodeSelector:
      node-role.kubernetes.io/master: ""
    services:
      dask-gateway:
        apiToken: token1
    tolerations:
    - key: node-role.kubernetes.io/master
      operator: Exists
  ingress:
    annotations:
      kubernetes.io/ingress.class: nginx
    enabled: true
  proxy:
    chp:
      nodeSelector:
        node-role.kubernetes.io/master: ""
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
    service:
      type: ClusterIP
  singleuser:
    cpu:
      guarantee: 1
      limit: 4
    defaultUrl: /lab
    image:
      name: pangeo/ml-notebook
      tag: latest
    lifecycleHooks:
      postStart:
        exec:
          command:
          - sh
          - -c
          - |
            chmod 700 .ssh; chmod g-s .ssh; chmod 600 .ssh/*; exit 0
    memory:
      guarantee: 1G
      limit: 8G
    startTimeout: 600
    storage:
      capacity: 2Gi
      type: dynamic
rbac:
  enabled: true

Note that this deployment does not use a prefix for JupyterHub.

This is how we create the Gateway:

from dask_gateway import Gateway
gateway = Gateway(
    "http://api-daskhub-dask-gateway.daskhub:8000/",
)

With:

$ sudo helm list -n daskhub
NAME    NAMESPACE   REVISION    UPDATED                                 STATUS      CHART               APP VERSION
daskhub daskhub     6           2022-08-13 09:52:06.606018286 +0000 UTC deployed    daskhub-2022.6.0    2022.6.1   

I am sorry I lack the background to answer the rest of your questions.

I hope it helps anyway!

Best regards, Sebastian

guillaumeeb commented 1 year ago

@j34ni I'm currently trying new things, I just see yo conected to the Hub, that might not work.

guillaumeeb commented 1 year ago

Finally! I have a working setup!!

We now have:

This was definitely the /jupyterhub/ prefix missing around, I had to reconfigure several things.

Here is a working values.yaml file without the secrets:

dask-gateway:
  enabled: true
  gateway:
    prefix: /jupyterhub/services/dask-gateway
    auth:
      type: jupyterhub
      jupyterhub:
        apiToken: "token1"
        apiUrl: "http://proxy-public/jupyterhub/hub/api"
    extraConfig:
      optionHandler: |
        from dask_gateway_server.options import Options, Integer, Float, String

        def options_handler(options):
          if ":" not in options.image:
            raise ValueError("When specifying an image you must also provide a tag")
          return {
            "worker_cores": options.worker_cores,
            "worker_memory": int(options.worker_memory * 2 ** 30),
            "image": options.image,
          }

        c.Backend.cluster_options = Options(
          Integer("worker_cores", default=1, min=1, max=4, label="Worker Cores"),
          Float("worker_memory", default=1, min=1, max=8, label="Worker Memory (GiB)"),
          String("image", default="pangeo/pangeo-notebook:latest", label="Image"),
          handler=options_handler,
        )
dask-kubernetes:
  enabled: false
jupyterhub:
  hub:
    baseUrl: /jupyterhub/
    services:
      dask-gateway:
        apiToken: "token1"
    extraConfig:
      # Register Dask Gateway service and setup singleuser environment.
      00-add-dask-gateway-values: |
        # 1. Sets `DASK_GATEWAY__PROXY_ADDRESS` in the singleuser environment.
        # 2. Adds the URL for the Dask Gateway JupyterHub service.
        import os
        # These are set by jupyterhub.
        release_name = os.environ["HELM_RELEASE_NAME"]
        release_namespace = os.environ["POD_NAMESPACE"]
        if "PROXY_HTTP_SERVICE_HOST" in os.environ:
            # https is enabled, we want to use the internal http service.
            gateway_address = "http://{}:{}/services/dask-gateway/".format(
                os.environ["PROXY_HTTP_SERVICE_HOST"],
                os.environ["PROXY_HTTP_SERVICE_PORT"],
            )
            print("Setting DASK_GATEWAY__ADDRESS {} from HTTP service".format(gateway_address))
        else:
            gateway_address = "http://proxy-public/jupyterhub/services/dask-gateway"
            print("Setting DASK_GATEWAY__ADDRESS {}".format(gateway_address))
        # Internal address to connect to the Dask Gateway.
        c.KubeSpawner.environment.setdefault("DASK_GATEWAY__ADDRESS", gateway_address)
        # Internal address for the Dask Gateway proxy.
        c.KubeSpawner.environment.setdefault("DASK_GATEWAY__PROXY_ADDRESS", "gateway://traefik-{}-dask-gateway.{}:80".format(release_name, release_namespace))
        # Relative address for the dashboard link.
        c.KubeSpawner.environment.setdefault("DASK_GATEWAY__PUBLIC_ADDRESS", "/jupyterhub/services/dask-gateway/")
        # Use JupyterHub to authenticate with Dask Gateway.
        c.KubeSpawner.environment.setdefault("DASK_GATEWAY__AUTH__TYPE", "jupyterhub")
        # Adds Dask Gateway as a JupyterHub service to make the gateway available at
        # {HUB_URL}/services/dask-gateway
        service_url = "http://traefik-{}-dask-gateway.{}".format(release_name, release_namespace)
        for service in c.JupyterHub.services:
            if service["name"] == "dask-gateway":
                if not service.get("url", None):
                    print("Adding dask-gateway service URL")
                    service.setdefault("url", service_url)
                break
        else:
            print("dask-gateway service not found. Did you set jupyterhub.hub.services.dask-gateway.apiToken?")
    config:
      GenericOAuthenticator:
        allowed_groups:
        - urn:mace:egi.eu:group:vo.pangeo.eu:role=member#aai.egi.eu
        authorize_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/auth
        claim_groups_key: eduperson_entitlement
        client_id: id
        client_secret: secret
        login_service: EGI Check-In
        oauth_callback_url: https://pangeo-foss4g.vm.fedcloud.eu/jupyterhub/hub/oauth_callback
        scope:
        - openid
        - email
        - profile
        - eduperson_entitlement
        token_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/token
        userdata_params:
          state: state
        userdata_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/userinfo
        username_key: preferred_username
      JupyterHub:
        authenticator_class: generic-oauth
  ingress:
    annotations:
      kubernetes.io/ingress.class: nginx
      cert-manager.io/cluster-issuer: "letsencrypt-prod"
    enabled: true
    tls:
    - hosts:
      - pangeo-foss4g.vm.fedcloud.eu
      secretName: pangeo-foss4g.vm.fedcloud.eu

  proxy:
    secretToken: "token2"
    service:
      type: ClusterIP
  singleuser:
    cpu:
      guarantee: 1
      limit: 2
    image:
      name: pangeo/pangeo-notebook
      tag: latest
    memory:
      guarantee: 4G
      limit: 8G
    extraEnv:
      DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: '{JUPYTER_IMAGE_SPEC}'

We should try to store it in this repo, I'm not sure how to handle the secret part yet. We can split the values.yaml file into two separated files, but we need a github tool to encrypt the secret part, does someone knows one?

guillaumeeb commented 1 year ago

@sebastian-luna-valero thanks for your help.

In your configuration, you override a lot of daskhub default with no changes, is this intended for explanations here?

Moreover, with this config (and this is working with the one I printed above), you should be able to just create a Gateway with empty args:

from dask_gateway import Gateway
gateway = Gateway()
j34ni commented 1 year ago

@guillaumeeb : Awesome !

As soon as the issues with the IM Dashboard will be resolved we should be able to add nodes to the existing pangeo-foss4g (or create a new infrastructure, without Binderhub?) and make it large enough to accommodate ~25 users at the workshop

guillaumeeb commented 1 year ago

@j34ni it'd be really nice if you could validate this setup (and maybe @tinaok if she has time).

tinaok commented 1 year ago

I confirm that https://pangeo-foss4g.vm.fedcloud.eu/jupyterhub/ I can connect and can start to verify tutorial notebook there.

I wasn't careful enough for the first try and I went to https://pangeo-foss4g.vm.fedcloud.eu/jupyterhub and i got into 'binder 404: not found' page...

tinaok commented 1 year ago

@guillaumeeb , Do you find a way to kill gateway cluster that I started with my user account? That was the concern of @j34ni at some point.

j34ni commented 1 year ago

@tinaok : I cannot check right now (there are not enough CPUs left to allow me to even login), however it is unlikely that another user can shutdown or even see your cluster since it uses JupyterHub to authenticate with Dask Gateway.

sebastian-luna-valero commented 1 year ago

Great news @guillaumeeb thank you very much! I built the values.yaml above based on:

@j34ni you can delete user pods with kubectl after ssh'ing into the cluster. To prevent this problem from reocurring in the future, I suggest you should explictly delete dask clusters from the trainee's notebook following:

https://gateway.dask.org/usage.html#shutdown-the-cluster

guillaumeeb commented 1 year ago

I think it is safe to close this issue now.

Thanks everyone here for the inputs.

@sebastian-luna-valero I was just meaning that you don't have to override the default values provided by https://artifacthub.io/packages/helm/dask/daskhub (but you probably know that).

@j34ni @tinaok I actually don't know if Jupyterhub services, auth and token prevents other dask-gateway users to see clusters. But as @sebastian-luna-valero said, one of the best way to clean things up is to use kubectl if users didn't do it. You just need to delete the correct Scheduler pod, and all the worker pods connected to it will be shut down in a minute.

@sebastian-luna-valero I would really appreciate you reopen your pull request documenting how to proceed for deploying an Daskhub on CESNET. There is a lot of important things in it, we should just update it a bit.

guillaumeeb commented 1 year ago

I think it is safe to close this issue now.

Thanks everyone here for the inputs.

@sebastian-luna-valero I was just meaning that you don't have to override the default values provided by https://artifacthub.io/packages/helm/dask/daskhub (but you probably know that).

@j34ni @tinaok I actually don't know if Jupyterhub services, auth and token prevents other dask-gateway users to see clusters. But as @sebastian-luna-valero said, one of the best way to clean things up is to use kubectl if users didn't do it. You just need to delete the correct Scheduler pod, and all the worker pods connected to it will be shut down in a minute.

@sebastian-luna-valero I would really appreciate you reopen your pull request documenting how to proceed for deploying an Daskhub on CESNET. There is a lot of important things in it, we should just update it a bit.