This repository contains Dockerfiles, scripts, yaml files, Helm charts, etc. used to scale out AI containers with versions of TensorFlow and PyTorch that have been optimized for Intel platforms. Scaling is done with python, Docker, kubernetes, kubeflow, cnvrg.io, Helm, and other container orchestration frameworks for use in the cloud and on-premise
Intel® Gaudi AI SW Tools Operator version 0.0.1 fails to deploy on OCP 4.16.0
Message displayed:
Operator failed
install failed: deployment gaudi-ai-sw-tools-operator-controller-manager not ready before timeout: deployment "gaudi-ai-sw-tools-operator-controller-manager" exceeded its progress deadline
Error Logs
Pending
16 Nov 2024, 11:05
RequirementsUnknownrequirements not yet checked
Pending
16 Nov 2024, 11:05
RequirementsNotMetone or more requirements couldn't be found
InstallReady
16 Nov 2024, 11:05
AllRequirementsMetall requirements found, attempting install
Installing
16 Nov 2024, 11:05
InstallSucceededwaiting for install components to report healthy
Installing
16 Nov 2024, 11:05
InstallWaitinginstalling: waiting for deployment gaudi-ai-sw-tools-operator-controller-manager to become ready: deployment "gaudi-ai-sw-tools-operator-controller-manager" not available: Deployment does not have minimum availability.
Failed
16 Nov 2024, 11:10
InstallCheckFailedinstall timeout
Pending
16 Nov 2024, 11:10
NeedsReinstallinstalling: waiting for deployment gaudi-ai-sw-tools-operator-controller-manager to become ready: deployment "gaudi-ai-sw-tools-operator-controller-manager" not available: Deployment does not have minimum availability.
InstallReady
16 Nov 2024, 11:10
AllRequirementsMetall requirements found, attempting install
Installing
16 Nov 2024, 11:10
InstallSucceededwaiting for install components to report healthy
Installing
16 Nov 2024, 11:10
InstallWaitinginstalling: waiting for deployment gaudi-ai-sw-tools-operator-controller-manager to become ready: deployment "gaudi-ai-sw-tools-operator-controller-manager" not available: Deployment does not have minimum availability.
Failed
16 Nov 2024, 11:15
InstallCheckFailedinstall timeout
Pending
16 Nov 2024, 11:15
NeedsReinstallinstalling: waiting for deployment gaudi-ai-sw-tools-operator-controller-manager to become ready: deployment "gaudi-ai-sw-tools-operator-controller-manager" not available: Deployment does not have minimum availability.
InstallReady
16 Nov 2024, 11:15
AllRequirementsMetall requirements found, attempting install
Installing
16 Nov 2024, 11:15
InstallSucceededwaiting for install components to report healthy
Failed
16 Nov 2024, 11:15
InstallCheckFailedinstall failed: deployment gaudi-ai-sw-tools-operator-controller-manager not ready before timeout: deployment "gaudi-ai-sw-tools-operator-controller-manager" exceeded its progress deadline
Reproduction Instructions
Install openshift OCP 4.16
Attempt to deploy Intel® Gaudi AI SW Tools Operator
Describe the bug
Intel® Gaudi AI SW Tools Operator version 0.0.1 fails to deploy on OCP 4.16.0
Message displayed: Operator failed install failed: deployment gaudi-ai-sw-tools-operator-controller-manager not ready before timeout: deployment "gaudi-ai-sw-tools-operator-controller-manager" exceeded its progress deadline
Error Logs
Reproduction Instructions
Affected Subfolder
Versions