After installed install the latest changes of Katib control plane
Run kubectl get pod -n kubeflow and the result is
root@k8master:~# kubectl get pod -n kubeflow
NAME READY STATUS RESTARTS AGE
katib-controller-86fbb67df-5mgpx 0/1 CrashLoopBackOff 52 (4m39s ago) 5h49m
katib-db-manager-7c8745f44b-4tzm5 0/1 CrashLoopBackOff 56 (54s ago) 5h49m
katib-mysql-77b9495867-fqb5l 0/1 Pending 0 5h49m
katib-ui-5d9c77cfc4-4bfzl 1/1 Running 0 5h49m
and run kubectl describe pod katib-controller-86fbb67df-5mgpx -n kubeflow , the result is
Name: katib-controller-86fbb67df-5mgpx
Namespace: kubeflow
Priority: 0
Service Account: katib-controller
Node: k8node02/192.168.100.12
Start Time: Thu, 10 Oct 2024 02:20:03 +0000
Labels: katib.kubeflow.org/component=controller
katib.kubeflow.org/metrics-collector-injection=disabled
pod-template-hash=86fbb67df
Annotations: prometheus.io/port: 8080
prometheus.io/scrape: true
sidecar.istio.io/inject: false
Status: Running
IP: 10.244.0.3
IPs:
IP: 10.244.0.3
Controlled By: ReplicaSet/katib-controller-86fbb67df
Containers:
katib-controller:
Container ID: docker://ec8cfc87a2c33a75ae61fd2d7ac906ccf52800fb49159e6e6253f129c0fd86bf
Image: docker.io/kubeflowkatib/katib-controller:latest
Image ID: docker-pullable://kubeflowkatib/katib-controller@sha256:103962f0810467fc5f6edcb46b8343387a289dd113dce38933ab15d3b0713261
Ports: 8443/TCP, 8080/TCP, 18080/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
./katib-controller
Args:
--katib-config=/katib-config.yaml
State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 10 Oct 2024 08:10:54 +0000
Finished: Thu, 10 Oct 2024 08:11:24 +0000
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 10 Oct 2024 08:04:52 +0000
Finished: Thu, 10 Oct 2024 08:05:22 +0000
Ready: False
Restart Count: 53
Liveness: http-get http://:healthz/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
KATIB_CORE_NAMESPACE: kubeflow (v1:metadata.namespace)
Mounts:
/katib-config.yaml from katib-config (ro,path="katib-config.yaml")
/tmp/cert from cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s4x2k (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cert:
Type: Secret (a volume populated by a Secret)
SecretName: katib-webhook-cert
Optional: false
katib-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: katib-config
Optional: false
kube-api-access-s4x2k:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 36m (x39 over 4h20m) kubelet (combined from similar events): Successfully pulled image "docker.io/kubeflowkatib/katib-controller:latest" in 20.234160626s (20.234172377s including waiting)
Warning Unhealthy 6m18s (x261 over 4h49m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
Warning BackOff 85s (x1164 over 4h48m) kubelet Back-off restarting failed container katib-controller in pod katib-controller-86fbb67df-5mgpx_kubeflow(c1cd3096-6bcc-4db2-969b-8f0ac265ae05)
Thanks!
What did you expect to happen?
Run kubectl get pod -n kubeflow and the result is
root@k8master:~# kubectl get pod -n kubeflow
NAME READY STATUS RESTARTS AGE
katib-controller-86fbb67df-5mgpx 1/1 Running 52 (4m39s ago) 5h49m
katib-db-manager-7c8745f44b-4tzm5 1/1 Running 56 (54s ago) 5h49m
katib-mysql-77b9495867-fqb5l 1/1 Running 0 5h49m
katib-ui-5d9c77cfc4-4bfzl 1/1 Running 0 5h49m
Environment
Kubernetes version:
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.0", GitCommit:"1b4df30b3cdfeaba6024e81e559a6cd09a089d65", GitTreeState:"clean", BuildDate:"2023-04-11T17:10:18Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.16", GitCommit:"cbb86e0d7f4a049666fac0551e8b02ef3d6c3d9a", GitTreeState:"clean", BuildDate:"2024-07-17T01:44:26Z", GoVersion:"go1.22.5", Compiler:"gc", Platform:"linux/amd64"}
What happened?
After installed install the latest changes of Katib control plane
Run
kubectl get pod -n kubeflow
and the result isand run
kubectl describe pod katib-controller-86fbb67df-5mgpx -n kubeflow
, the result isThanks!
What did you expect to happen?
Run
kubectl get pod -n kubeflow
and the result isEnvironment
Kubernetes version:
Katib controller version: `` docker.io/kubeflowkatib/katib-controller:latest
Name: kubeflow-katib Version: 0.17.0 Summary: Katib Python SDK for APIVersion v1beta1 Home-page: https://github.com/kubeflow/katib/tree/master/sdk/python/v1beta1 Author: Kubeflow Authors Author-email: premnath.vel@gmail.com License: Apache License Version 2.0 Location: /root/miniconda3/lib/python3.10/site-packages Requires: certifi, grpcio, kubernetes, protobuf, setuptools, six, urllib3 Required-by: