jupyterhub / zero-to-jupyterhub-k8s

Helm Chart & Documentation for deploying JupyterHub on Kubernetes
https://zero-to-jupyterhub.readthedocs.io
Other
1.54k stars 796 forks source link

Almost standard helm install results in 500 internal error due to redirect loop when connecting to user notebook #3298

Closed StefanVanDyck closed 10 months ago

StefanVanDyck commented 10 months ago

Bug description

I tried upgrading my jupyterhub deployment from version 2.0.0 of the helm chart to version 3.2.1. But I am totally stumped by this redirect error. I tried to reduce the helm config I use to the absolute simplest form and I cannot seem to get things to work.

I found many people with similar issues, but always when trying to run locally. Where the suggested solution is to use a different port to connect to. So maybe there is an issue with how the helm chart ingress is setup?

How to reproduce

Your personal set up

Version(s):

Configuration Helm values: ``` --- hub: db: pvc: storageClassName: ceph-block proxy: service: type: NodePort ingress: enabled: true hosts: - hub.xxx.xxx.xxx singleuser: image: name: jupyter/datascience-notebook tag: latest pullPolicy: Always cmd: null profileList: - display_name: "Minimal environment" description: "To avoid too much bells and whistles: Python." default: true ```
Logs Hub logs: ``` [I 2023-12-10 09:51:41.110 JupyterHub log:191] 302 GET /hub/ -> /user/stefan/ (stefan@10.212.134.201) 17.23ms [I 2023-12-10 09:51:41.139 JupyterHub log:191] 302 GET /user/stefan/lab? -> /hub/user/stefan/lab? (@10.212.134.201) 0.67ms [I 2023-12-10 09:51:41.153 JupyterHub log:191] 302 GET /hub/user/stefan/lab? -> /user/stefan/lab?redirects=1 (stefan@10.212.134.201) 2.55ms [I 2023-12-10 09:51:41.166 JupyterHub log:191] 302 GET /user/stefan/lab?redirects=1 -> /hub/user/stefan/lab?redirects=1 (@10.212.134.201) 0.39ms [W 2023-12-10 09:51:41.182 JupyterHub base:1656] Redirect loop detected on /hub/user/stefan/lab?redirects=1 [I 2023-12-10 09:51:42.566 JupyterHub log:191] 200 GET /hub/home (stefan@10.212.134.201) 2.99ms [I 2023-12-10 09:51:43.183 JupyterHub log:191] 302 GET /hub/user/stefan/lab?redirects=1 -> /user/stefan/lab?redirects=2 (stefan@10.212.134.201) 2001.93ms [I 2023-12-10 09:51:48.431 JupyterHub log:191] 302 GET /user/stefan/lab? -> /hub/user/stefan/lab? (@10.212.134.201) 0.65ms [I 2023-12-10 09:51:48.450 JupyterHub log:191] 302 GET /hub/user/stefan/lab? -> /user/stefan/lab?redirects=1 (stefan@10.212.134.201) 2.41ms [I 2023-12-10 09:51:48.463 JupyterHub log:191] 302 GET /user/stefan/lab?redirects=1 -> /hub/user/stefan/lab?redirects=1 (@10.212.134.201) 0.64ms [W 2023-12-10 09:51:48.478 JupyterHub base:1656] Redirect loop detected on /hub/user/stefan/lab?redirects=1 [I 2023-12-10 09:51:50.481 JupyterHub log:191] 302 GET /hub/user/stefan/lab?redirects=1 -> /user/stefan/lab?redirects=2 (stefan@10.212.134.201) 2004.18ms [I 2023-12-10 09:51:50.570 JupyterHub log:191] 302 GET /user/stefan/lab?redirects=2 -> /hub/user/stefan/lab?redirects=2 (@10.212.134.201) 0.65ms [W 2023-12-10 09:51:50.588 JupyterHub web:1869] 500 GET /hub/user/stefan/lab?redirects=2 (10.212.134.201): Redirect loop detected. Notebook has jupyterhub version unknown (likely < 0.8), but the Hub expects 4.0.2. Try installing jupyterhub==4.0.2 in the user environment if you continue to have problems. [E 2023-12-10 09:51:50.589 JupyterHub log:183] { "X-Real-Ip": "10.212.134.201", "X-Forwarded-Server": "ingress-traefik-6npzb", "X-Forwarded-Proto": "https,http", "X-Forwarded-Port": "443,80", "X-Forwarded-Host": "hub.xxx.xxx.xxx", "X-Forwarded-For": "10.212.134.201,::ffff:10.200.140.218", "Upgrade-Insecure-Requests": "1", "Traceparent": "00-1d52f61eb819e093d82e6f6a714785b7-285cdb4f83c85f85-01", "Sec-Fetch-User": "?1", "Sec-Fetch-Site": "same-origin", "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Dest": "document", "Sec-Ch-Ua-Platform": "\"Linux\"", "Sec-Ch-Ua-Mobile": "?0", "Sec-Ch-Ua": "\"Chromium\";v=\"118\", \"Google Chrome\";v=\"118\", \"Not=A?Brand\";v=\"99\"", "Referer": "https://hub.xxx.xxx.xxx/hub/home", "Cookie": "_xsrf=[secret]; jupyterhub-hub-login=[secret]; jupyterhub-session-id=[secret]", "Accept-Language": "en-US,en;q=0.9,nl;q=0.8", "Accept-Encoding": "gzip, deflate, br", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36", "Host": "hub.xxx.xxx.xxx", "Connection": "keep-alive" } [E 2023-12-10 09:51:50.590 JupyterHub log:191] 500 GET /hub/user/stefan/lab?redirects=2 (stefan@10.212.134.201) 3.38ms ``` User notebook: ``` [I 2023-12-10 09:32:19.545 LabApp] JupyterLab application directory is /opt/conda/share/jupyter/lab [I 2023-12-10 09:32:19.545 LabApp] Extension Manager is 'pypi'. [I 2023-12-10 09:32:19.547 ServerApp] jupyterlab | extension was successfully loaded. [I 2023-12-10 09:32:19.551 ServerApp] jupyterlab_git | extension was successfully loaded. [I 2023-12-10 09:32:19.554 ServerApp] nbclassic | extension was successfully loaded. [I 2023-12-10 09:32:19.605 ServerApp] nbdime | extension was successfully loaded. [I 2023-12-10 09:32:19.608 ServerApp] notebook | extension was successfully loaded. [I 2023-12-10 09:32:19.608 ServerApp] Serving notebooks from local directory: /home/jovyan [I 2023-12-10 09:32:19.608 ServerApp] Jupyter Server 2.8.0 is running at: [I 2023-12-10 09:32:19.608 ServerApp] http://jupyter-stefan:8888/user/stefan/lab?token=... [I 2023-12-10 09:32:19.608 ServerApp] http://127.0.0.1:8888/user/stefan/lab?token=... [I 2023-12-10 09:32:19.608 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [I 2023-12-10 09:32:20.219 ServerApp] 302 GET /user/stefan/ -> /user/stefan/lab? (@10.200.171.251) 0.56ms [I 2023-12-10 09:32:20.445 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server [I 2023-12-10 09:32:26.483 ServerApp] 302 GET /user/stefan/ -> /user/stefan/lab? (@10.212.134.201) 0.85ms [I 2023-12-10 09:32:40.448 ServerApp] 302 GET /user/stefan/ -> /user/stefan/lab? (@10.212.134.201) 0.86ms [I 2023-12-10 09:32:47.409 ServerApp] 302 GET /user/stefan/ -> /user/stefan/lab? (@10.212.134.201) 0.82ms [I 2023-12-10 09:51:41.125 ServerApp] 302 GET /user/stefan/ -> /user/stefan/lab? (@10.212.134.201) 0.79ms [I 2023-12-10 09:51:48.415 ServerApp] 302 GET /user/stefan/ -> /user/stefan/lab? (@10.212.134.201) 0.81ms ``` Configurabe Http Proxy: ``` 09:32:01.747 [ConfigProxy] info: 200 GET /api/routes 09:32:20.221 [ConfigProxy] info: Adding route /user/stefan -> http://10.200.140.214:8888 09:32:20.221 [ConfigProxy] info: Route added /user/stefan -> http://10.200.140.214:8888 09:32:20.221 [ConfigProxy] info: 201 POST /api/routes/user/stefan 09:33:01.751 [ConfigProxy] info: 200 GET /api/routes ```

Is this a bug, or am I simply missing some combination of config values? Any help would be greatly appreciated.

welcome[bot] commented 10 months ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

StefanVanDyck commented 10 months ago

Ah breakthrough, port-forwarding port 8000 of the proxy pod does allow me to access it using localhost:8000. So I suppose the problem is the port used in the proxy-public service is not correct?

I managed to get things to work by manually editing the kubernetes resources. I changed the proxy-public service to target port 8000 and had to add an additional selector label that I also added to the proxy deployment.

But I am not sure if this is the intended deployment or if I am missing the idea behind the current setup.

StefanVanDyck commented 10 months ago

Ok, so I thing something like this might be needed to make the public service work correctly. https://github.com/jupyterhub/zero-to-jupyterhub-k8s/commit/52bd1e21f456f1a1f245db5368c9f05ea0262681

consideRatio commented 10 months ago

Is your ingress controller pods labelled to be allowed network access to the proxy pod? They are sending traffic to the proxy pod - so they need the label if you have network policy enforcement in your k8s cluster.

consideRatio commented 10 months ago

Was the user server restarted since the upgrade - or left running since before?

consideRatio commented 10 months ago

Make sure you have read the changelog for 3.0.0 as well before continuing if you havnt, i've forgotten what was breaking so i cant rule out something there is of interest.

/ From a mobile device

StefanVanDyck commented 10 months ago

@consideRatio Thanks for having a look. I tried restarting the user notebook, but the behaviour is the same.

I think the problem is the selector for the public proxy service is not specific to the Proxy pods and picks up the hub pods too. (But could definitely be missing a trick)

I did a complete clean install with the config above, which should be acceptable according to the schema of 3.2.1.

StefanVanDyck commented 10 months ago

Also I don't believe I have network policy enforcement enabled ( yet, you know how it is :) )

consideRatio commented 10 months ago

The config doesnt make sense to me, you have both ingress and procy.service.type nodeport. That allows flows in two ways, one directly thorugh the node port to the proxy pod and onwards, and one via an ingress controller.

Are you using an ingress controller? When using an incress controller, you wont need proxy service type nodePort and can use ClusterIP instead for the proxy service, which is proxied to by the ingress controller.

StefanVanDyck commented 10 months ago

@consideRatio Yeah, I used NodePort to try and debug the issue without going through my traefik ingress. The Nodeport has the same issue when I connect to it directly. Indeed the intention is to use ClusterIP.

I managed to get things working by applying these patches using ansible:

- name: Fix jupyterhub public proxy deployment
  kubernetes.core.k8s:
    state: patched
    kind: Deployment
    name: jupyterhub-proxy
    namespace: jupyterhub
    definition:
      spec:
        template:
          metadata:
            labels:
              hub.jupyter.org/network-selector: proxy
  become: true

- name: Fix jupyterhub public proxy service
  kubernetes.core.k8s:
    state: patched
    kind: Service
    name: jupyterhub-proxy-public
    namespace: jupyterhub
    definition:
      spec:
        selector:
          hub.jupyter.org/network-selector: proxy
  become: true

I couldn't find a way to get soemthing like this to work with the current helm config / templating

consideRatio commented 10 months ago

What was the labels on the pod template patched, and the labels on the service selector patched before and after?

If they dont target the right pod, its quite weird.

StefanVanDyck commented 10 months ago

the public proxy service:

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: jupyterhub
    meta.helm.sh/release-namespace: jupyterhub
  creationTimestamp: "2023-12-10T11:44:27Z"
  labels:
    app.kubernetes.io/instance: jupyterhub
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: jupyterhub
    app.kubernetes.io/version: 4.0.2
    helm.sh/chart: jupyterhub-3.2.1
  name: jupyterhub-proxy-public
  namespace: jupyterhub
  resourceVersion: "1594693"
  uid: 5f2d5bda-76f5-4bf1-b128-ee20f12dbf88
spec:
  clusterIP: 10.201.188.121
  clusterIPs:
  - 10.201.188.121
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/instance: jupyterhub
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: jupyterhub
    app.kubernetes.io/version: 4.0.2
    helm.sh/chart: jupyterhub-3.2.1
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

the proxy deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "3"
    meta.helm.sh/release-name: jupyterhub
    meta.helm.sh/release-namespace: jupyterhub
  creationTimestamp: "2023-12-10T11:24:11Z"
  generation: 3
  labels:
    app.kubernetes.io/instance: jupyterhub
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: jupyterhub
    app.kubernetes.io/version: 4.0.2
    helm.sh/chart: jupyterhub-3.2.1
  name: jupyterhub-proxy
  namespace: jupyterhub
  resourceVersion: "1624737"
  uid: af341ffb-1b6f-412c-920d-0f1ec8ea0b58
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: jupyterhub
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: jupyterhub
      app.kubernetes.io/version: 4.0.2
      helm.sh/chart: jupyterhub-3.2.1
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        checksum/auth-token: 56bd
        checksum/proxy-secret: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: jupyterhub
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: jupyterhub
        app.kubernetes.io/version: 4.0.2
        helm.sh/chart: jupyterhub-3.2.1
        hub.jupyter.org/network-access-hub: "true"
        hub.jupyter.org/network-access-singleuser: "true"
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
              - key: hub.jupyter.org/node-purpose
                operator: In
                values:
                - core
            weight: 100
      containers:
      - command:
        - configurable-http-proxy
        - --ip=
        - --api-ip=
        - --api-port=8001
        - --default-target=http://jupyterhub-hub:$(JUPYTERHUB_HUB_SERVICE_PORT)
        - --error-target=http://jupyterhub-hub:$(JUPYTERHUB_HUB_SERVICE_PORT)/hub/error
        - --port=8000
        env:
        - name: CONFIGPROXY_AUTH_TOKEN
          valueFrom:
            secretKeyRef:
              key: hub.config.ConfigurableHTTPProxy.auth_token
              name: jupyterhub-hub
        image: quay.io/jupyterhub/configurable-http-proxy:4.6.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 30
          httpGet:
            path: /_chp_healthz
            port: http
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        name: chp
        ports:
        - containerPort: 8000
          name: http
          protocol: TCP
        - containerPort: 8001
          name: api
          protocol: TCP
        readinessProbe:
          failureThreshold: 1000
          httpGet:
            path: /_chp_healthz
            port: http
            scheme: HTTP
          periodSeconds: 2
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          runAsGroup: 65534
          runAsUser: 65534
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: deltaray-docker-hub-secret
      priorityClassName: jupyterhub
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 60
      tolerations:
      - effect: NoSchedule
        key: hub.jupyter.org/dedicated
        operator: Equal
        value: core
      - effect: NoSchedule
        key: hub.jupyter.org_dedicated
        operator: Equal
        value: core
StefanVanDyck commented 10 months ago

The hub deployment which I believe also matches the service selector:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "11"
    meta.helm.sh/release-name: jupyterhub
    meta.helm.sh/release-namespace: jupyterhub
  creationTimestamp: "2023-12-09T19:28:20Z"
  generation: 11
  labels:
    app.kubernetes.io/instance: jupyterhub
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: jupyterhub
    app.kubernetes.io/version: 4.0.2
    helm.sh/chart: jupyterhub-3.2.1
  name: jupyterhub-hub
  namespace: jupyterhub
  resourceVersion: "1624843"
  uid: 11ec00aa-8fc6-49e0-9e8f-856312f5d1c4
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: jupyterhub
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: jupyterhub
      app.kubernetes.io/version: 4.0.2
      helm.sh/chart: jupyterhub-3.2.1
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        checksum/config-map: 056097a9b118be9539dfb219fce09af7a03ef153c66cf1d60c021c06e14c4f53
        checksum/secret: 53b59084d26e8c82f643a4e999671af9392aa09c1233bd1eeedc081585334261
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: jupyterhub
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: jupyterhub
        app.kubernetes.io/version: 4.0.2
        helm.sh/chart: jupyterhub-3.2.1
        hub.jupyter.org/network-access-proxy-api: "true"
        hub.jupyter.org/network-access-proxy-http: "true"
        hub.jupyter.org/network-access-singleuser: "true"
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
              - key: hub.jupyter.org/node-purpose
                operator: In
                values:
                - core
            weight: 100
      containers:
      - args:
        - jupyterhub
        - --config
        - /usr/local/etc/jupyterhub/jupyterhub_config.py
        - --upgrade-db
        env:
        - name: PYTHONUNBUFFERED
          value: "1"
        - name: HELM_RELEASE_NAME
          value: jupyterhub
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: CONFIGPROXY_AUTH_TOKEN
          valueFrom:
            secretKeyRef:
              key: hub.config.ConfigurableHTTPProxy.auth_token
              name: jupyterhub-hub
        - name: JUPYTERHUB_OAUTH2_CLIENT_SECRET
          valueFrom:
            secretKeyRef:
              key: secret
              name: jupyterhub-oauth2-client
        image: quay.io/jupyterhub/k8s-hub:3.2.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 30
          httpGet:
            path: /hub/health
            port: http
            scheme: HTTP
          initialDelaySeconds: 300
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        name: hub
        ports:
        - containerPort: 8081
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 1000
          httpGet:
            path: /hub/health
            port: http
            scheme: HTTP
          periodSeconds: 2
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          runAsGroup: 1000
          runAsUser: 1000
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/local/etc/jupyterhub/jupyterhub_config.py
          name: config
          subPath: jupyterhub_config.py
        - mountPath: /usr/local/etc/jupyterhub/z2jh.py
          name: config
          subPath: z2jh.py
        - mountPath: /usr/local/etc/jupyterhub/config/
          name: config
        - mountPath: /usr/local/etc/jupyterhub/secret/
          name: secret
        - mountPath: /etc/ssl/certs/ca-certificates.crt
          name: certificates
          readOnly: true
          subPath: ca-certificates.crt
        - mountPath: /srv/jupyterhub
          name: pvc
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: deltaray-docker-hub-secret
      priorityClassName: jupyterhub
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1000
      serviceAccount: jupyterhub-hub
      serviceAccountName: jupyterhub-hub
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: hub.jupyter.org/dedicated
        operator: Equal
        value: core
      - effect: NoSchedule
        key: hub.jupyter.org_dedicated
        operator: Equal
        value: core
      volumes:
      - configMap:
          defaultMode: 420
          name: jupyterhub-hub
        name: config
      - name: secret
        secret:
          defaultMode: 420
          secretName: jupyterhub-hub
      - hostPath:
          path: /etc/ssl/certs/
          type: ""
        name: certificates
      - name: pvc
        persistentVolumeClaim:
          claimName: jupyterhub-hub-db-dir
consideRatio commented 10 months ago

Hmmmm, these labels like app.kubernetes.io/instance are not what i expect to see. Is this really a default installation?

This chart use old label naming, "app: jupyterhub" etc

StefanVanDyck commented 10 months ago

@consideRatio Oh no you are entirely correct. I am an absolute idiot...

I install jupyterhub as a subchart to a custom chart I helpfully called "jupyterhub". It contains the standard _helpers generated by running helm create. The helpers contain definitions for things like jupyterhub.labels, jupyterhub.selectors, etc. .... Which in turn completely overrides some of the definitions used by the actual jupyterhub chart.

Thank you so much for you help! This thing had me tearing my hair out... Hope I did not waste too much of your time.

consideRatio commented 10 months ago

Ahh, it happens! I'm glad you got it resolved and thank you for following up on the resolution ❤️ 🌻 🎉