ml-tooling / ml-hub

🧰 Multi-user development platform for machine learning teams. Simple to setup within minutes.
Apache License 2.0
301 stars 64 forks source link

Readiness probe failed when hub pod created #26

Closed CarsonLeon closed 3 years ago

CarsonLeon commented 3 years ago

Describe the issue:

An error occurred while building mlhub using the Amazon EKS service.

When I installed mlhub on EKS, two pods were generated, but when I checked the status, one pod was not in the Ready status, so mlhub was unavailable.

Input kubectl --namespace=default get pod

Output

NAME                     READY   STATUS    RESTARTS   AGE
hub-6b567dbfd8-6k2tn     0/1     Running   0          3h53m
proxy-84f5d55b94-4d9dz   1/1     Running   0          3h53m

When I check the detailed status of the pod hub-6b567dbfd8-6k2tn, I get the following error:

Input

kubectl describe pods hub-6b567dbfd8-6k2tn

Output

Name:         hub-6b567dbfd8-6k2tn
Namespace:    default
Priority:     0
Node:         ip-192-168-10-148.ap-northeast-1.compute.internal/192.168.10.148
Start Time:   Tue, 17 Aug 2021 11:19:23 +0800
Labels:       app=mlhub
              component=hub
              hub.jupyter.org/network-access-proxy-api=true
              hub.jupyter.org/network-access-proxy-http=true
              hub.jupyter.org/network-access-singleuser=true
              pod-template-hash=6b567dbfd8
              release=mlhub
Annotations:  checksum/config-map: d8905435057c013d65412bb01dd6e6ab73a266eabdfd847bf3d85954c013f48c
              checksum/secret: a95bc8602c127abbfd6a9b08d89835f1c10c4e99e3f95af2688dc4e9e4f07ef6
              kubernetes.io/psp: eks.privileged
Status:       Running
IP:           192.168.12.34
IPs:
  IP:           192.168.12.34
Controlled By:  ReplicaSet/hub-6b567dbfd8
Containers:
  hub:
    Container ID:   docker://47056a6bd6af45b9e117b2325de6510bb64ecb8c83a87b246320e2a3faab7193
    Image:          mltooling/ml-hub:1.0.0
    Image ID:       docker-pullable://mltooling/ml-hub@sha256:71fd4787ba74cd0ae5b6d127abf3d8d817f61e4016ee78d17ba6e6ec70b30aec
    Ports:          8081/TCP, 22/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Tue, 17 Aug 2021 11:19:47 +0800
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:      200m
      memory:   512Mi
    Readiness:  http-get http://:hub/mlhub/hub/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ADDITIONAL_ARGS:            --config /resources/jupyterhub_config.py --debug
      START_NGINX:                false
      EXECUTION_MODE:             k8s
      PYTHONUNBUFFERED:           1
      HELM_RELEASE_NAME:          mlhub
      POD_NAMESPACE:              default (v1:metadata.namespace)
      CONFIGPROXY_AUTH_TOKEN:     <set to the key 'proxy.token' in secret 'hub-secret'>  Optional: false
      DYNAMIC_WHITELIST_ENABLED:  true
    Mounts:
      /etc/jupyterhub/config/ from config (rw)
      /etc/jupyterhub/secret/ from secret (rw)
      /resources/jupyterhub_user_config.py from user-config (rw,path="jupyterhub_user_config.py")
      /var/run/secrets/kubernetes.io/serviceaccount from hub-token-lbwq8 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hub-config
    Optional:  false
  secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hub-secret
    Optional:    false
  user-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hub-user-config
    Optional:  false
  hub-token-lbwq8:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hub-token-lbwq8
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  68s (x1348 over 3h45m)  kubelet  Readiness probe failed: Get http://192.168.12.34:8081/mlhub/hub/health:  dial tcp 192.168.12.34:8081: connect: connection refused

Technical details: