Closed ckadner closed 2 years ago
Describe the bug
After deploying MLX on OpenShift (4.8, 4.10 on either IBM Cloud or Fyre)
# export MLX_DEPLOYMENT_TYPE=mlx-single-ibmcloud-openshift export MLX_DEPLOYMENT_TYPE=mlx-single-fyre-openshift git clone https://github.com/IBM/manifests -b v1.5-branch && cd manifests while ! kustomize build ${MLX_DEPLOYMENT_TYPE} | \ kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
The mlx-ui pod fails to start up:
mlx-ui
NAME READY STATUS RESTARTS AGE cache-deployer-deployment-798dc7d98b-9c4sj 1/1 Running 0 83s cache-server-86f59c8696-d499g 0/1 ContainerCreating 0 83s kfp-csi-s3-4srhx 0/2 ContainerCreating 0 81s kfp-csi-s3-9bqft 0/2 ContainerCreating 0 81s kfp-csi-s3-gklqr 0/2 ContainerCreating 0 81s metadata-envoy-deployment-5b4856dd5-m6t4m 1/1 Running 0 83s metadata-grpc-deployment-6b5685488-gnszx 1/1 Running 0 83s metadata-writer-9f698fdcb-x47pd 1/1 Running 0 83s minio-5b65df66c9-d257k 1/1 Running 0 83s ml-pipeline-77b7b79565-p2wfq 1/1 Running 0 83s ml-pipeline-persistenceagent-684f664fb7-q255d 1/1 Running 0 83s ml-pipeline-scheduledworkflow-5dfcf96788-6mp2n 1/1 Running 0 82s ml-pipeline-ui-6dfcc5c664-pkgbr 1/1 Running 0 82s ml-pipeline-viewer-crd-5878c6454f-mk92c 1/1 Running 0 82s ml-pipeline-visualizationserver-6876996cdd-s4qvd 1/1 Running 0 82s mlx-api-7f46b6df4f-xdvzw 1/1 Running 0 82s mlx-ui-7fbbbf6cbb-hll4z 0/1 Error 3 82s mysql-f7b9b7dd4-75l2q 1/1 Running 0 82s
We can see exit code 243 in oc describe pod mlx-ui-7fbbbf6cbb-hll4z:
243
oc describe pod mlx-ui-7fbbbf6cbb-hll4z
Containers: mlx-ui: Container ID: cri-o://5d9d1caa2f3544a78c8b0e2cdc9cba9fc495a7c108ee3443220b417ca8c55d4b Image: mlexchange/mlx-ui:nightly-origin-main Image ID: docker.io/mlexchange/mlx-ui@sha256:70aa61ce62caeeeeaa549420c4684b5e0edb3dc96a8151b11f15939c5fe14152 Port: 3000/TCP Host Port: 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 243
After deleting the mlx-ui pod, the mlx-ui comes up fine:
$ oc get pods | grep mlx-ui mlx-ui-7fbbbf6cbb-hll4z 0/1 CrashLoopBackOff 7 13m $ oc delete pod mlx-ui-7fbbbf6cbb-hll4z pod "mlx-ui-7fbbbf6cbb-hll4z" deleted $ oc get pods | grep mlx-ui mlx-ui-7fbbbf6cbb-r5kxh 1/1 Running 0 16s
Thanks @jbusche for verifying this error to be consistent across various OC deployments
Describe the bug
After deploying MLX on OpenShift (4.8, 4.10 on either IBM Cloud or Fyre)
The
mlx-ui
pod fails to start up:We can see exit code
243
inoc describe pod mlx-ui-7fbbbf6cbb-hll4z
:After deleting the
mlx-ui
pod, themlx-ui
comes up fine:Thanks @jbusche for verifying this error to be consistent across various OC deployments