kubeflow / manifests

A repository for Kustomize manifests
Apache License 2.0
804 stars 867 forks source link

ml-pipeline and metadata-grpc-deployment crashloopbackoff on Kubeflow1.7 #2436

Closed Sundragon1993 closed 9 months ago

Sundragon1993 commented 1 year ago

Please help, My env: k8s: 1.24.12 kf: 1.7.0 cilium: v1.10.0-rc0

auth dex-8579644bbb-hqf2c 1/1 Running 0 17m cert-manager cert-manager-7475574-tgl9f 1/1 Running 0 19m cert-manager cert-manager-cainjector-d5dc6cd7f-xcgzs 1/1 Running 0 19m cert-manager cert-manager-webhook-6868bd8b7-xknbj 1/1 Running 0 19m gvh453 ml-pipeline-ui-artifact-5b4465bcb7-vphbv 2/2 Running 0 12m gvh453 ml-pipeline-visualizationserver-5568776585-xhl4g 2/2 Running 0 12m istio-system authservice-0 1/1 Running 0 17m istio-system cluster-local-gateway-757849494c-j48d7 1/1 Running 0 17m istio-system istio-ingressgateway-cf7bd56f-x9bsz 1/1 Running 0 17m istio-system istiod-586fcd6677-sbcjx 1/1 Running 0 17m knative-eventing eventing-controller-5b7bfc8895-cjdd6 1/1 Running 0 17m knative-eventing eventing-webhook-5896d776b-fttzv 1/1 Running 0 17m knative-serving activator-5bbf976855-r5t9m 2/2 Running 0 16m knative-serving autoscaler-5cc8b77f4d-746zc 2/2 Running 0 16m knative-serving controller-657b7bb75c-bnktt 2/2 Running 0 16m knative-serving domain-mapping-6c4878cc54-sqdzv 2/2 Running 0 16m knative-serving domainmapping-webhook-f76bcd89f-gd2q8 2/2 Running 0 16m knative-serving net-istio-controller-6cb499fccb-vpn6p 2/2 Running 0 16m knative-serving net-istio-webhook-6858cd8998-ffrxc 2/2 Running 0 16m knative-serving webhook-76f9bc6584-7rpkg 2/2 Running 0 16m kube-system cilium-operator-7c748dbd5d-5cl6x 1/1 Running 0 20m kube-system cilium-rl6tq 1/1 Running 0 20m kube-system coredns-57575c5f89-fqcn8 1/1 Running 0 22m kube-system coredns-57575c5f89-xndm2 1/1 Running 0 22m kube-system etcd-e2m122-pc 1/1 Running 0 22m kube-system kube-apiserver-e2m122-pc 1/1 Running 0 22m kube-system kube-controller-manager-e2m122-pc 1/1 Running 0 22m kube-system kube-proxy-85gmm 1/1 Running 0 22m kube-system kube-scheduler-e2m122-pc 1/1 Running 0 22m kube-system nvidia-device-plugin-daemonset-fbmrn 1/1 Running 0 22m kubeflow-user-example-com ml-pipeline-ui-artifact-5b4465bcb7-p8rzk 2/2 Running 0 15m kubeflow-user-example-com ml-pipeline-visualizationserver-5568776585-4svcc 2/2 Running 0 15m kubeflow admission-webhook-deployment-cb6db9648-brdcl 1/1 Running 0 16m kubeflow cache-server-86584db5d8-sbkz6 2/2 Running 0 16m kubeflow centraldashboard-dd9c778b6-dxw2j 2/2 Running 0 16m kubeflow jupyter-web-app-deployment-cc9cbc696-4s48h 2/2 Running 0 16m kubeflow katib-controller-86d4d45478-7tw7l 1/1 Running 0 16m kubeflow katib-db-manager-689cdf95c6-5xr8n 1/1 Running 0 16m kubeflow katib-mysql-5bc98798b4-xkhtl 1/1 Running 0 16m kubeflow katib-ui-b5d5cf978-8l9bk 2/2 Running 1 (16m ago) 16m kubeflow kserve-controller-manager-7879bf6dd7-9f2qr 2/2 Running 0 16m kubeflow kserve-models-web-app-f9c576856-kxd2r 2/2 Running 0 16m kubeflow kubeflow-pipelines-profile-controller-5dd5468d9b-zwcdq 1/1 Running 0 16m kubeflow metacontroller-0 1/1 Running 0 16m kubeflow metadata-envoy-deployment-76c587bd47-29hlm 1/1 Running 0 16m kubeflow metadata-grpc-deployment-5c8599b99c-g6qmq 1/2 CrashLoopBackOff 8 (117s ago) 16m kubeflow metadata-writer-6c576c94b8-7qtb4 2/2 Running 6 (3m6s ago) 16m kubeflow minio-6d6d45469f-8gq2h 2/2 Running 0 16m kubeflow ml-pipeline-77d4d9974b-9dtlb 1/2 CrashLoopBackOff 7 (2m6s ago) 16m kubeflow ml-pipeline-persistenceagent-75bccd8b64-tjf4n 2/2 Running 0 16m kubeflow ml-pipeline-scheduledworkflow-6dfcd5dd89-q597m 2/2 Running 0 16m kubeflow ml-pipeline-ui-5ddb5b76d8-rrxmb 2/2 Running 0 16m kubeflow ml-pipeline-viewer-crd-86cbc45d9b-dzlf6 2/2 Running 1 (16m ago) 16m kubeflow ml-pipeline-visualizationserver-5577c64b45-h7864 2/2 Running 0 16m kubeflow mysql-6878bbff69-xdz42 2/2 Running 0 16m kubeflow notebook-controller-deployment-699589b4f9-67qk8 2/2 Running 1 (16m ago) 16m kubeflow profiles-deployment-74f656c59f-ndqvp 3/3 Running 1 (16m ago) 16m kubeflow tensorboard-controller-deployment-5655cc9dbb-5r22h 3/3 Running 1 (16m ago) 16m kubeflow tensorboards-web-app-deployment-8474fd9569-5g9v7 2/2 Running 0 16m kubeflow training-operator-7f768bbbdb-ww55q 1/1 Running 0 16m kubeflow volumes-web-app-deployment-7b998df674-mwqdz 2/2 Running 0 16m kubeflow workflow-controller-78c979dc75-l2rht 2/2 Running 1 (16m ago) 16m local-path-storage local-path-provisioner-8f77648b6-db2r7 1/1 Running 0 22 ` ========++++++++++================================================================== Name: metadata-grpc-deployment-5c8599b99c-g6qmq Namespace: kubeflow Priority: 0 Node: e2m122-pc/192.168.0.80 Start Time: Thu, 06 Apr 2023 18:01:26 +0800 Labels: application-crd-id=kubeflow-pipelines component=metadata-grpc-server pod-template-hash=5c8599b99c security.istio.io/tlsMode=istio service.istio.io/canonical-name=metadata-grpc-deployment service.istio.io/canonical-revision=latest Annotations: kubectl.kubernetes.io/default-container: container kubectl.kubernetes.io/default-logs-container: container prometheus.io/path: /stats/prometheus prometheus.io/port: 15020 prometheus.io/scrape: true sidecar.istio.io/status: {"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-env... Status: Running IP: 10.0.0.230 IPs: IP: 10.0.0.230 Controlled By: ReplicaSet/metadata-grpc-deployment-5c8599b99c Init Containers: istio-init: Container ID: containerd://98176fc0f962dadb3e2e4c2ba2042e1871042c762e3043d7f62b18107f794b1f Image: docker.io/istio/proxyv2:1.16.0 Image ID: docker.io/istio/proxyv2@sha256:f6f97fa4fb77a3cbe1e3eca0fa46bd462ad6b284c129cf57bf91575c4fb50cf9 Port: Host Port: Args: State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 06 Apr 2023 18:01:54 +0800 Finished: Thu, 06 Apr 2023 18:01:54 +0800 Ready: True Restart Count: 0 Limits: cpu: 2 memory: 1Gi Requests: cpu: 10m memory: 40Mi Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5r2rx (ro) Containers: container: Container ID: containerd://7cc284f4f5373b167047d0087ee8c0931c5290efe92b30426b4f82458854064e Image: gcr.io/tfx-oss-public/ml_metadata_store_server:1.5.0 Image ID: gcr.io/tfx-oss-public/ml_metadata_store_server@sha256:db8691752b4cd02658e4bb28b73d34a18ba71f49d6cc124a47c0c5001f8c0f83 Port: 8080/TCP Host Port: 0/TCP Command: /bin/metadata_store_server Args: --grpc_port=8080 --mysql_config_database=$(MYSQL_DATABASE) --mysql_config_host=$(MYSQL_HOST) --mysql_config_port=$(MYSQL_PORT) --mysql_config_user=$(DBCONFIG_USER) --mysql_config_password=$(DBCONFIG_PASSWORD) --enable_database_upgrade=true State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 137 Started: Thu, 06 Apr 2023 18:09:05 +0800 Finished: Thu, 06 Apr 2023 18:09:50 +0800 Ready: False Restart Count: 6 Liveness: http-get http://:15020/app-health/container/livez delay=3s timeout=2s period=5s #success=1 #failure=3 Readiness: http-get http://:15020/app-health/container/readyz delay=3s timeout=2s period=5s #success=1 #failure=3 Environment: DBCONFIG_USER: <set to the key 'username' in secret 'mysql-secret'> Optional: false DBCONFIG_PASSWORD: <set to the key 'password' in secret 'mysql-secret'> Optional: false MYSQL_DATABASE: <set to the key 'mlmdDb' of config map 'pipeline-install-config'> Optional: false MYSQL_HOST: <set to the key 'dbHost' of config map 'pipeline-install-config'> Optional: false MYSQL_PORT: <set to the key 'dbPort' of config map 'pipeline-install-config'> Optional: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5r2rx (ro) istio-proxy: Container ID: containerd://0dabec7c855d12867e76dcfad08f863a1e755624430cfc145fcfe272ad4c97ff Image: docker.io/istio/proxyv2:1.16.0 Image ID: docker.io/istio/proxyv2@sha256:f6f97fa4fb77a3cbe1e3eca0fa46bd462ad6b284c129cf57bf91575c4fb50cf9 Port: 15090/TCP Host Port: 0/TCP Args: proxy sidecar --domain $(POD_NAMESPACE).svc.cluster.local --proxyLogLevel=warning --proxyComponentLogLevel=misc:error --log_output_level=default:info --concurrency 2 State: Running Started: Thu, 06 Apr 2023 18:01:55 +0800 Ready: True Restart Count: 0 Limits: cpu: 2 memory: 1Gi Requests: cpu: 10m memory: 40Mi Readiness: http-get http://:15021/healthz/ready delay=1s timeout=3s period=2s #success=1 #failure=30 Environment: JWT_POLICY: third-party-jwt PILOT_CERT_PROVIDER: istiod CA_ADDR: istiod.istio-system.svc:15012 POD_NAME: metadata-grpc-deployment-5c8599b99c-g6qmq (v1:metadata.name) POD_NAMESPACE: kubeflow (v1:metadata.namespace) INSTANCE_IP: (v1:status.podIP) SERVICE_ACCOUNT: (v1:spec.serviceAccountName) HOST_IP: (v1:status.hostIP) PROXY_CONFIG: {}

  ISTIO_META_POD_PORTS:          [
                                     {"name":"grpc-api","containerPort":8080,"protocol":"TCP"}
                                 ]
  ISTIO_META_APP_CONTAINERS:     container
  ISTIO_META_CLUSTER_ID:         Kubernetes
  ISTIO_META_INTERCEPTION_MODE:  REDIRECT
  ISTIO_META_WORKLOAD_NAME:      metadata-grpc-deployment
  ISTIO_META_OWNER:              kubernetes://apis/apps/v1/namespaces/kubeflow/deployments/metadata-grpc-deployment
  ISTIO_META_MESH_ID:            cluster.local
  TRUST_DOMAIN:                  cluster.local
  ISTIO_KUBE_APP_PROBERS:        {"/app-health/container/livez":{"tcpSocket":{"port":8080},"timeoutSeconds":2},"/app-health/container/readyz":{"tcpSocket":{"port":8080},"timeoutSeconds":2}}
Mounts:
  /etc/istio/pod from istio-podinfo (rw)
  /etc/istio/proxy from istio-envoy (rw)
  /var/lib/istio/data from istio-data (rw)
  /var/run/secrets/credential-uds from credential-socket (rw)
  /var/run/secrets/istio from istiod-ca-cert (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5r2rx (ro)
  /var/run/secrets/tokens from istio-token (rw)
  /var/run/secrets/workload-spiffe-credentials from workload-certs (rw)
  /var/run/secrets/workload-spiffe-uds from workload-socket (rw)

Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: workload-socket: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: credential-socket: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: workload-certs: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: istio-envoy: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: istio-data: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: istio-podinfo: Type: DownwardAPI (a volume populated by information about the pod) Items: metadata.labels -> labels metadata.annotations -> annotations istio-token: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 43200 istiod-ca-cert: Type: ConfigMap (a volume populated by a ConfigMap) Name: istio-ca-root-cert Optional: false kube-api-access-5r2rx: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 10m default-scheduler Successfully assigned kubeflow/metadata-grpc-deployment-5c8599b99c-g6qmq to e2m122-pc Normal Pulled 10m kubelet Container image "docker.io/istio/proxyv2:1.16.0" already present on machine Normal Created 10m kubelet Created container istio-init Normal Started 10m kubelet Started container istio-init Normal Pulled 10m kubelet Container image "docker.io/istio/proxyv2:1.16.0" already present on machine Normal Created 10m kubelet Created container istio-proxy Normal Started 10m kubelet Started container istio-proxy Normal Started 9m59s (x3 over 10m) kubelet Started container container Normal Pulled 9m32s (x4 over 10m) kubelet Container image "gcr.io/tfx-oss-public/ml_metadata_store_server:1.5.0" already present on machine Normal Created 9m32s (x4 over 10m) kubelet Created container container Warning BackOff 1s (x42 over 10m) kubelet Back-off restarting failed container ==================================ML pipeline==========================================

Name: ml-pipeline-77d4d9974b-9dtlb Namespace: kubeflow Priority: 0 Node: e2m122-pc/192.168.0.80 Start Time: Thu, 06 Apr 2023 18:01:23 +0800 Labels: app=ml-pipeline app.kubernetes.io/component=ml-pipeline app.kubernetes.io/name=kubeflow-pipelines application-crd-id=kubeflow-pipelines pod-template-hash=77d4d9974b security.istio.io/tlsMode=istio service.istio.io/canonical-name=kubeflow-pipelines service.istio.io/canonical-revision=latest Annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: true kubectl.kubernetes.io/default-container: ml-pipeline-api-server kubectl.kubernetes.io/default-logs-container: ml-pipeline-api-server prometheus.io/path: /stats/prometheus prometheus.io/port: 15020 prometheus.io/scrape: true sidecar.istio.io/status: {"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-env... Status: Running IP: 10.0.0.145 IPs: IP: 10.0.0.145 Controlled By: ReplicaSet/ml-pipeline-77d4d9974b Init Containers: istio-init: Container ID: containerd://4ef4038065e34337f3a298dd99f5706dc3d542bac13a81a66df32acb62f5e25b Image: docker.io/istio/proxyv2:1.16.0 Image ID: docker.io/istio/proxyv2@sha256:f6f97fa4fb77a3cbe1e3eca0fa46bd462ad6b284c129cf57bf91575c4fb50cf9 Port: Host Port: Args: istio-iptables -p 15001 -z 15006 -u 1337 -m REDIRECT -i * -x

  -b
  *
  -d
  15090,15021,15020
  --log_output_level=default:info
State:          Terminated
  Reason:       Completed
  Exit Code:    0
  Started:      Thu, 06 Apr 2023 18:01:26 +0800
  Finished:     Thu, 06 Apr 2023 18:01:26 +0800
Ready:          True
Restart Count:  0
Limits:
  cpu:     2
  memory:  1Gi
Requests:
  cpu:        10m
  memory:     40Mi
Environment:  <none>
Mounts:
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zjklj (ro)

Containers: ml-pipeline-api-server: Container ID: containerd://9dd354f638060b788593d8471e045286acee612595deb280a1fca7bdb658f9fe Image: gcr.io/ml-pipeline/api-server:2.0.0-alpha.7 Image ID: gcr.io/ml-pipeline/api-server@sha256:3b75be9180bad7ac56017a554a4a9402e57b333a48e8bd83c8614f69babee032 Ports: 8888/TCP, 8887/TCP Host Ports: 0/TCP, 0/TCP State: Running Started: Thu, 06 Apr 2023 18:10:26 +0800 Last State: Terminated Reason: Error Exit Code: 137 Started: Thu, 06 Apr 2023 18:08:56 +0800 Finished: Thu, 06 Apr 2023 18:10:26 +0800 Ready: False Restart Count: 6 Requests: cpu: 250m memory: 500Mi Liveness: exec [wget -q -S -O - http://localhost:8888/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3 Readiness: exec [wget -q -S -O - http://localhost:8888/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3 Startup: exec [wget -q -S -O - http://localhost:8888/apis/v1beta1/healthz] delay=0s timeout=2s period=5s #success=1 #failure=12 Environment Variables from: pipeline-api-server-config-dc9hkg52h6 ConfigMap Optional: false Environment: KUBEFLOW_USERID_HEADER: kubeflow-userid KUBEFLOW_USERID_PREFIX:
AUTO_UPDATE_PIPELINE_DEFAULT_VERSION: <set to the key 'autoUpdatePipelineDefaultVersion' of config map 'pipeline-install-config'> Optional: false POD_NAMESPACE: kubeflow (v1:metadata.namespace) OBJECTSTORECONFIG_SECURE: false OBJECTSTORECONFIG_BUCKETNAME: <set to the key 'bucketName' of config map 'pipeline-install-config'> Optional: false DBCONFIG_USER: <set to the key 'username' in secret 'mysql-secret'> Optional: false DBCONFIG_PASSWORD: <set to the key 'password' in secret 'mysql-secret'> Optional: false DBCONFIG_DBNAME: <set to the key 'pipelineDb' of config map 'pipeline-install-config'> Optional: false DBCONFIG_HOST: <set to the key 'dbHost' of config map 'pipeline-install-config'> Optional: false DBCONFIG_PORT: <set to the key 'dbPort' of config map 'pipeline-install-config'> Optional: false DBCONFIG_CONMAXLIFETIME: <set to the key 'ConMaxLifeTime' of config map 'pipeline-install-config'> Optional: false OBJECTSTORECONFIG_ACCESSKEY: <set to the key 'accesskey' in secret 'mlpipeline-minio-artifact'> Optional: false OBJECTSTORECONFIG_SECRETACCESSKEY: <set to the key 'secretkey' in secret 'mlpipeline-minio-artifact'> Optional: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zjklj (ro) istio-proxy: Container ID: containerd://55a8a7ae8c8817404e9b07cf966efa767f76c1f0c67606aca7facdab7af430aa Image: docker.io/istio/proxyv2:1.16.0 Image ID: docker.io/istio/proxyv2@sha256:f6f97fa4fb77a3cbe1e3eca0fa46bd462ad6b284c129cf57bf91575c4fb50cf9 Port: 15090/TCP Host Port: 0/TCP Args: proxy sidecar --domain $(POD_NAMESPACE).svc.cluster.local --proxyLogLevel=warning --proxyComponentLogLevel=misc:error --log_output_level=default:info --concurrency 2 State: Running Started: Thu, 06 Apr 2023 18:01:27 +0800 Ready: True Restart Count: 0 Limits: cpu: 2 memory: 1Gi Requests: cpu: 10m memory: 40Mi Readiness: http-get http://:15021/healthz/ready delay=1s timeout=3s period=2s #success=1 #failure=30 Environment: JWT_POLICY: third-party-jwt PILOT_CERT_PROVIDER: istiod CA_ADDR: istiod.istio-system.svc:15012 POD_NAME: ml-pipeline-77d4d9974b-9dtlb (v1:metadata.name) POD_NAMESPACE: kubeflow (v1:metadata.namespace) INSTANCE_IP: (v1:status.podIP) SERVICE_ACCOUNT: (v1:spec.serviceAccountName) HOST_IP: (v1:status.hostIP) PROXY_CONFIG: {}

  ISTIO_META_POD_PORTS:          [
                                     {"name":"http","containerPort":8888,"protocol":"TCP"}
                                     ,{"name":"grpc","containerPort":8887,"protocol":"TCP"}
                                 ]
  ISTIO_META_APP_CONTAINERS:     ml-pipeline-api-server
  ISTIO_META_CLUSTER_ID:         Kubernetes
  ISTIO_META_INTERCEPTION_MODE:  REDIRECT
  ISTIO_META_WORKLOAD_NAME:      ml-pipeline
  ISTIO_META_OWNER:              kubernetes://apis/apps/v1/namespaces/kubeflow/deployments/ml-pipeline
  ISTIO_META_MESH_ID:            cluster.local
  TRUST_DOMAIN:                  cluster.local
Mounts:
  /etc/istio/pod from istio-podinfo (rw)
  /etc/istio/proxy from istio-envoy (rw)
  /var/lib/istio/data from istio-data (rw)
  /var/run/secrets/credential-uds from credential-socket (rw)
  /var/run/secrets/istio from istiod-ca-cert (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zjklj (ro)
  /var/run/secrets/tokens from istio-token (rw)
  /var/run/secrets/workload-spiffe-credentials from workload-certs (rw)
  /var/run/secrets/workload-spiffe-uds from workload-socket (rw)

Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: workload-socket: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: credential-socket: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: workload-certs: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: istio-envoy: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: istio-data: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: istio-podinfo: Type: DownwardAPI (a volume populated by information about the pod) Items: metadata.labels -> labels metadata.annotations -> annotations istio-token: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 43200 istiod-ca-cert: Type: ConfigMap (a volume populated by a ConfigMap) Name: istio-ca-root-cert Optional: false kube-api-access-zjklj: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 10m default-scheduler Successfully assigned kubeflow/ml-pipeline-77d4d9974b-9dtlb to e2m122-pc Warning FailedMount 10m kubelet MountVolume.SetUp failed for volume "istiod-ca-cert" : failed to sync configmap cache: timed out waiting for the condition Normal Pulled 10m kubelet Container image "docker.io/istio/proxyv2:1.16.0" already present on machine Normal Created 10m kubelet Created container istio-init Normal Started 10m kubelet Started container istio-init Normal Created 10m kubelet Created container istio-proxy Normal Started 10m kubelet Started container ml-pipeline-api-server Normal Pulled 10m kubelet Container image "docker.io/istio/proxyv2:1.16.0" already present on machine Normal Started 10m kubelet Started container istio-proxy Normal Killing 9m6s kubelet Container ml-pipeline-api-server failed startup probe, will be restarted Normal Created 8m36s (x2 over 10m) kubelet Created container ml-pipeline-api-server Normal Pulled 8m36s (x2 over 10m) kubelet Container image "gcr.io/ml-pipeline/api-server:2.0.0-alpha.7" already present on machine Warning Unhealthy 6s (x84 over 10m) kubelet Startup probe failed:

=============================MetaWriter=================

Events: Type Reason Age From Message


Normal Scheduled 39m default-scheduler Successfully assigned kubeflow/metadata-writer-6c576c94b8-7qtb4 to e2m122-pc Warning FailedCreatePodSandBox 38m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "2f11a598c5b23a60ceebd496250ae8537fe0283678ceeb56b37951bc64afdb4f": plugin type="cilium-cni" name="cilium" failed (add): Unable to create endpoint: response status code does not match any response statuses defined for this endpoint in the swagger spec (status 429): {} Normal Pulled 38m kubelet Container image "docker.io/istio/proxyv2:1.16.0" already present on machine Normal Created 38m kubelet Created container istio-init Normal Started 38m kubelet Started container istio-init Normal Pulled 38m kubelet Container image "docker.io/istio/proxyv2:1.16.0" already present on machine Normal Created 38m kubelet Created container istio-proxy Normal Started 38m kubelet Started container istio-proxy Normal Pulled 32m (x4 over 38m) kubelet Container image "gcr.io/ml-pipeline/metadata-writer:2.0.0-alpha.7" already present on machine Normal Created 32m (x4 over 38m) kubelet Created container main Normal Started 32m (x4 over 38m) kubelet Started container main Warning BackOff 3m39s (x94 over 35m) kubelet Back-off restarting failed container

ibnummuhammad commented 1 year ago

try increasing the memory limit. in my case I increased it to 6gb (more is better), and it worked

JuanPabloSebey commented 1 year ago

for me it was a cilium networking provider compatibility issue. I had to move to kubenet and it worked.

skuchipu commented 11 months ago

try increasing the memory limit. in my case I increased it to 6gb (more is better), and it worked

How do we increase memory as the deployment does not have the resources specified

juliusvonkohout commented 9 months ago

/close

google-oss-prow[bot] commented 9 months ago

@juliusvonkohout: Closing this issue.

In response to [this](https://github.com/kubeflow/manifests/issues/2436#issuecomment-1832317643): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.