jupyterhub / zero-to-jupyterhub-k8s

Helm Chart & Documentation for deploying JupyterHub on Kubernetes
https://zero-to-jupyterhub.readthedocs.io
Other
1.56k stars 801 forks source link

Hub service failure after some time of usage #3012

Open waaldev opened 1 year ago

waaldev commented 1 year ago

I have installed Zero to JupyterHub on EKS and it was working fine for a while. However, after some time of usage, the hub service starts failing to start servers and returns an error 500. Restarting the services seems to fix the issue temporarily.

I would like to report this issue and request a solution to keep the hub service running without any failures. Any help would be greatly appreciated.

Steps to reproduce:

Start JupyterHub and wait for some times 2 or 3 days Try to start a server Observe the error 500 message Restart the hub deployment and try to start a server again Observe that the server starts successfully

Environment: Helm Chart version: 2.0.0 Database: Amazon Aurora MySQL

Config.yml:

singleuser:
  lifecycleHooks:
    postStart:
      exec:
        command:
          - "sh"
          - "-c"
          - >
            cp -n /home/.Rprofile /home/jovyan/.Rprofile;
            chown jovyan:users /home/jovyan/.Rprofile;
            mkdir -p /home/jovyan/.config/pip;
            cp -n /etc/pip.conf /home/jovyan/.config/pip/pip.conf;
  memory:
    limit: 16G
    guarantee: 1G
  cpu:
    limit: 4
    guarantee: 0.5
  defaultUrl: "/lab"
  image:
    name: CUSTOM IMAGE
    tag: TAG
  storage:
    type: "static"
    static:
      pvcName: "efs-jhub"
      subPath: "home/{username}"
  extraEnv:
    CHOWN_HOME: "yes"
    JULIA_DEPOT_PATH: "/home/jovyan/.julia"
  uid: 0
  fsGid: 0
  cmd: "start-singleuser.sh"
hub:
  db:
    type: mysql
    url: URL
    upgrade: true
  config:
    Authenticator:
      admin_users:
        - admin
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
  nodeSelector:
    hub.jupyter.org/node-purpose: hub

proxy:
  service:
    type: NodePort
  chp:
    resources:
      requests:
        cpu: 500m
        memory: 256Mi
    nodeSelector:
      hub.jupyter.org/node-purpose: hub
  traefik:
    resources:
      requests:
        cpu: 500m 
        memory: 512Mi
  secretSync: 
    resources:
      requests:
        cpu: 10m
        memory: 64Mi
scheduling:
  userScheduler:
    resources:
      requests:
        cpu: 30m
        memory: 512Mi
    nodeSelector:
      hub.jupyter.org/node-purpose: hub
  userPods:
    nodeAffinity:
      matchNodePurpose: require
prePuller:
  resources:
    requests:
      cpu: 10m
      memory: 8Mi
  hook: 
    resources:
      requests:
        cpu: 10m
        memory: 8Mi

cull:
  enabled: true
  timeout: 600
  every: 120
  removeNamedServers: true

debug:
  enabled: true

Logs:

Uncaught exception POST /hub/api/users/geccaxpkce3wj7y/server (::ffff:34.57.72.158)
    HTTPServerRequest(protocol='http', host='premium', method='POST', uri='/hub/api/users/geccaxpkce3wj7y/server', version='HTTP/1.1', remote_ip='::ffff:34.57.72.158')
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/tornado/web.py", line 1713, in _execute
        result = await result
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/apihandlers/users.py", line 539, in post
        await self.spawn_single_user(user, server_name, options=options)
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/handlers/base.py", line 878, in spawn_single_user
        active_counts = self.users.count_active_users()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 233, in count_active_users
        if spawner.active:
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/spawner.py", line 158, in active
        return bool(self.pending or self.ready)
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/spawner.py", line 148, in ready
        if self.server is None:
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/spawner.py", line 229, in server
        orm_server = self.orm_spawner.server
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 482, in __get__
        return self.impl.get(state, dict_)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 942, in get
        value = self._fire_loader_callables(state, key, passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 978, in _fire_loader_callables
        return self.callable_(state, passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/strategies.py", line 872, in _load_for_state
        primary_key_identity = self._get_ident_for_use_get(
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/strategies.py", line 931, in _get_ident_for_use_get
        return [
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/strategies.py", line 932, in <listcomp>
        get_attr(state, dict_, self._equated_columns[pk], passive=passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/mapper.py", line 2983, in _get_state_attr_by_column
        return state.manager[prop.key].impl.get(state, dict_, passive=passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 942, in get
        value = self._fire_loader_callables(state, key, passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 973, in _fire_loader_callables
        return state._load_expired(state, passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/state.py", line 712, in _load_expired
        self.manager.expired_attribute_loader(self, toload, passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 1465, in load_scalar_attributes
        raise orm_exc.ObjectDeletedError(state)
    sqlalchemy.orm.exc.ObjectDeletedError: Instance '<Spawner at 0x7ff4bce43070>' has been deleted, or its row is otherwise not present.

[W 2023-02-05 08:01:48.637 JupyterHub base:166] Rolling back session due to database error Instance '<Spawner at 0x7ff4bce43070>' has been deleted, or its row is otherwise not present.
[E 2023-02-05 08:01:48.642 JupyterHub log:178] {
      "X-Forwarded-Host": "premium",
      "Accept-Encoding": "gzip",
      "Content-Type": "application/json",
      "Authorization": "Token [secret]",
      "User-Agent": "Go-http-client/1.1",
      "Content-Length": "0",
      "X-Amzn-Trace-Id": "Root=1-63df626c-6b46e67621f8b37f4f14ac4d",
      "Host": "premium",
      "X-Forwarded-Port": "80,80",
      "X-Forwarded-Proto": "http,http",
      "X-Forwarded-For": "192.168.65.151,::ffff:34.57.72.158",
      "Connection": "close"
    }
[E 2023-02-05 08:01:48.642 JupyterHub log:186] 500 POST /hub/api/users/geccaxpkce3wj7y/server (admin::ffff:34.57.72.158) 9.50ms

Thank you.

welcome[bot] commented 1 year ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada: