Closed yuhuishi-convect closed 4 months ago
Hello @yuhuishi-convect , can you provide more information about other deployments in your cluster?
ml-pipeline is the last Deployment that can be ready only when other Deployments are running. Possible reason is that your storage client is failing (SQL database, etc.), which caused the ml-pipeline also failing. Can you share more information about the healthiness of your other Deployments in the cluster?
May I ask which Kubernetes version you are deploying to? Similar post: https://github.com/kubernetes/kubernetes/issues/106111
Hello @yuhuishi-convect , can you provide more information about other deployments in your cluster?
ml-pipeline is the last Deployment that can be ready only when other Deployments are running. Possible reason is that your storage client is failing (SQL database, etc.), which caused the ml-pipeline also failing. Can you share more information about the healthiness of your other Deployments in the cluster?
$ k get pods -n kubeflow-helm
NAME READY STATUS RESTARTS AGE
cache-deployer-deployment-bb8d6cb65-9hqfb 1/1 Running 0 10m
cache-server-7fffdd889d-zgnc9 1/1 Running 0 10m
metadata-envoy-7cd8b6db48-nw6w8 1/1 Running 0 10m
metadata-grpc-deployment-69995cb9dc-lq9c8 1/1 Running 1 10m
metadata-writer-5986bfb78-v7dwr 1/1 Running 0 10m
minio-5cd667bc76-2965c 1/1 Running 0 10m
ml-pipeline-5ffbcfcd95-wjhvn 0/1 Running 5 4m12s
ml-pipeline-persistenceagent-84fdcf9cbc-pq2nv 1/1 Running 4 10m
ml-pipeline-scheduledworkflow-59d66b54c6-qc957 1/1 Running 0 10m
ml-pipeline-ui-58d56bd7cc-mvzcl 1/1 Running 0 10m
ml-pipeline-viewer-crd-856f5454d8-hkk65 1/1 Running 0 10m
ml-pipeline-visualizationserver-5486886667-c62pr 1/1 Running 0 10m
mysql-85445f56b7-b7fp5 1/1 Running 0 11m
workflow-controller-7f469d8fcd-c6fzn 1/1 Running 0 10m
May I ask which Kubernetes version you are deploying to? Similar post: kubernetes/kubernetes#106111
$ k version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:41:28Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
@yuhuishi-convect
The KFP backend 1.2 is very old version, it might not work in Kubernetes 1.21. Can you try to deploy KFP backend v1.8.1 instead? https://github.com/kubeflow/pipelines/releases/tag/1.8.1
Closing this issue, KFP 2.0.5 is available. Feel free to reopen it if the issue persists in the latest version.
/close
@rimolive: Closing this issue.
Environment
Steps to reproduce
Expected result
Materials and Reference
The liveness probe of the `ml-pipeline` deployment failed.
``` $ k describe -n kubeflow pod ml-pipeline-5f465d4c56-7xcs8 Name: ml-pipeline-5f465d4c56-7xcs8 Namespace: kubeflow Priority: 0 Node: ip-10-0-3-78.us-west-2.compute.internal/10.0.3.78 Start Time: Mon, 07 Feb 2022 11:05:22 -0800 Labels: app=ml-pipeline application-crd-id=kubeflow-pipelines pod-template-hash=5f465d4c56 Annotations: kubectl.kubernetes.io/restartedAt: 2022-02-06T17:31:44-08:00 kubernetes.io/psp: eks.privileged sidecar.istio.io/inject: false Status: Running IP: 10.0.3.52 IPs: IP: 10.0.3.52 Controlled By: ReplicaSet/ml-pipeline-5f465d4c56 Containers: ml-pipeline-api-server: Container ID: docker://6659ead43604634288ebe7987ba5f41e892e06c568645b2883547b3c26cdb167 Image: gcr.io/ml-pipeline/api-server:1.2.0 Image ID: docker-pullable://gcr.io/ml-pipeline/api-server@sha256:6553e9855e6d38eb5a70beeea39a2c37ac85b60f26a5c061b5e5e2adfffd960b Ports: 8888/TCP, 8887/TCP Host Ports: 0/TCP, 0/TCP State: Running Started: Mon, 07 Feb 2022 11:05:23 -0800 Ready: False Restart Count: 0 Liveness: exec [wget -q -S -O - http://localhost:8888/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3 Readiness: exec [wget -q -S -O - http://localhost:8888/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3 Environment: AUTO_UPDATE_PIPELINE_DEFAULT_VERSION:Logs of the pod
Executing the health check from the pod receives no response
Deployment yaml of the `ml-pipeline`
``` $ k get deploy -n kubeflow ml-pipeline -o yaml apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "5" kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app":"ml-pipeline","application-crd-id":"kubeflow-pipelines"},"name":"ml-pipeline","namespace":"kubeflow"},"spec":{"selector":{"matchLabels":{"app":"ml-pipeline","application-crd-id":"kubeflow-pipelines"}},"template":{"metadata":{"labels":{"app":"ml-pipeline","application-crd-id":"kubeflow-pipelines"}},"spec":{"containers":[{"env":[{"name":"AUTO_UPDATE_PIPELINE_DEFAULT_VERSION","valueFrom":{"configMapKeyRef":{"key":"autoUpdatePipelineDefaultVersion","name":"pipeline-install-config-d42hc87dh2"}}},{"name":"POD_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},{"name":"OBJECTSTORECONFIG_SECURE","value":"false"},{"name":"OBJECTSTORECONFIG_BUCKETNAME","valueFrom":{"configMapKeyRef":{"key":"bucketName","name":"pipeline-install-config-d42hc87dh2"}}},{"name":"DBCONFIG_USER","valueFrom":{"secretKeyRef":{"key":"username","name":"mysql-secret-fd5gktm75t"}}},{"name":"DBCONFIG_PASSWORD","valueFrom":{"secretKeyRef":{"key":"password","name":"mysql-secret-fd5gktm75t"}}},{"name":"DBCONFIG_DBNAME","valueFrom":{"configMapKeyRef":{"key":"pipelineDb","name":"pipeline-install-config-d42hc87dh2"}}},{"name":"DBCONFIG_HOST","valueFrom":{"configMapKeyRef":{"key":"dbHost","name":"pipeline-install-config-d42hc87dh2"}}},{"name":"DBCONFIG_PORT","valueFrom":{"configMapKeyRef":{"key":"dbPort","name":"pipeline-install-config-d42hc87dh2"}}},{"name":"OBJECTSTORECONFIG_ACCESSKEY","valueFrom":{"secretKeyRef":{"key":"accesskey","name":"mlpipeline-minio-artifact"}}},{"name":"OBJECTSTORECONFIG_SECRETACCESSKEY","valueFrom":{"secretKeyRef":{"key":"secretkey","name":"mlpipeline-minio-artifact"}}}],"image":"gcr.io/ml-pipeline/api-server:1.2.0","imagePullPolicy":"IfNotPresent","livenessProbe":{"exec":{"command":["wget","-q","-S","-O","-","http://localhost:8888/apis/v1beta1/healthz"]},"initialDelaySeconds":3,"periodSeconds":5,"timeoutSeconds":2},"name":"ml-pipeline-api-server","ports":[{"containerPort":8888,"name":"http"},{"containerPort":8887,"name":"grpc"}],"readinessProbe":{"exec":{"command":["wget","-q","-S","-O","-","http://localhost:8888/apis/v1beta1/healthz"]},"initialDelaySeconds":3,"periodSeconds":5,"timeoutSeconds":2}}],"serviceAccountName":"ml-pipeline"}}}} creationTimestamp: "2021-01-15T22:01:56Z" generation: 15 labels: app: ml-pipeline application-crd-id: kubeflow-pipelines name: ml-pipeline namespace: kubeflow ownerReferences: - apiVersion: app.k8s.io/v1beta1 blockOwnerDeletion: true controller: false kind: Application name: pipeline uid: ea8a9b37-0c16-439e-bc49-3399051aca6e resourceVersion: "532602378" selfLink: /apis/apps/v1/namespaces/kubeflow/deployments/ml-pipeline uid: 908e252d-c7c6-49f2-88e0-dcf568097b14 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: ml-pipeline application-crd-id: kubeflow-pipelines strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: annotations: kubectl.kubernetes.io/restartedAt: "2022-02-06T17:31:44-08:00" sidecar.istio.io/inject: "false" creationTimestamp: null labels: app: ml-pipeline application-crd-id: kubeflow-pipelines spec: containers: - env: - name: AUTO_UPDATE_PIPELINE_DEFAULT_VERSION valueFrom: configMapKeyRef: key: autoUpdatePipelineDefaultVersion name: pipeline-install-config-d42hc87dh2 - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: OBJECTSTORECONFIG_SECURE value: "false" - name: OBJECTSTORECONFIG_BUCKETNAME valueFrom: configMapKeyRef: key: bucketName name: pipeline-install-config-d42hc87dh2 - name: DBCONFIG_USER valueFrom: secretKeyRef: key: username name: mysql-secret-fd5gktm75t - name: DBCONFIG_PASSWORD valueFrom: secretKeyRef: key: password name: mysql-secret-fd5gktm75t - name: DBCONFIG_DBNAME valueFrom: configMapKeyRef: key: pipelineDb name: pipeline-install-config-d42hc87dh2 - name: DBCONFIG_HOST valueFrom: configMapKeyRef: key: dbHost name: pipeline-install-config-d42hc87dh2 - name: DBCONFIG_PORT valueFrom: configMapKeyRef: key: dbPort name: pipeline-install-config-d42hc87dh2 - name: OBJECTSTORECONFIG_ACCESSKEY valueFrom: secretKeyRef: key: accesskey name: mlpipeline-minio-artifact - name: OBJECTSTORECONFIG_SECRETACCESSKEY valueFrom: secretKeyRef: key: secretkey name: mlpipeline-minio-artifact image: gcr.io/ml-pipeline/api-server:1.2.0 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - wget - -q - -S - -O - '-' - http://localhost:8888/apis/v1beta1/healthz failureThreshold: 3 initialDelaySeconds: 3 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 2 name: ml-pipeline-api-server ports: - containerPort: 8888 name: http protocol: TCP - containerPort: 8887 name: grpc protocol: TCP readinessProbe: exec: command: - wget - -q - -S - -O - '-' - http://localhost:8888/apis/v1beta1/healthz failureThreshold: 3 initialDelaySeconds: 3 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 2 resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: ml-pipeline serviceAccountName: ml-pipeline terminationGracePeriodSeconds: 30 status: conditions: - lastTransitionTime: "2022-02-07T18:58:32Z" lastUpdateTime: "2022-02-07T18:58:32Z" message: Deployment does not have minimum availability. reason: MinimumReplicasUnavailable status: "False" type: Available - lastTransitionTime: "2022-02-07T19:16:46Z" lastUpdateTime: "2022-02-07T19:16:46Z" message: ReplicaSet "ml-pipeline-5f465d4c56" has timed out progressing. reason: ProgressDeadlineExceeded status: "False" type: Progressing observedGeneration: 15 replicas: 2 unavailableReplicas: 2 updatedReplicas: 1 ```Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.