Hello,
Here is the error I get when deploying on ICP 3.1.1 (no pb with ICP 2.x) due to the GPU management by K8s which changed with the latest versions of K8s. Symptom: POD scheduling error at helm install.
With ICP 3.1 & 3.1.1 (and K8s version >= 11) nvidia.com/gpu should be used instead of alpha.kubernetes.io/nvidia-gpu,
Here is a modified helm chart that works on my environment:
The critical part is in the Helm templates/deployment.yaml file in the requests/limits lines:
resources:
limits:
{{- if and (eq (.Capabilities.KubeVersion.Major|int) 1) (lt (.Capabilities.KubeVersion.Minor|int) 11) }}
alpha.kubernetes.io/nvidia-gpu: {{ .Values.resources.limits.gpu }}
{{- else }}
nvidia.com/gpu: {{ .Values.resources.limits.gpu }}
{{- end }}
memory: {{ .Values.resources.limits.memory }}
requests:
{{- if and (eq (.Capabilities.KubeVersion.Major|int) 1) (lt (.Capabilities.KubeVersion.Minor|int) 11) }}
alpha.kubernetes.io/nvidia-gpu: {{ .Values.resources.requests.gpu }}
{{- else }}
nvidia.com/gpu: {{ .Values.resources.requests.gpu }}
{{- end }}
memory: {{ .Values.resources.requests.memory }}
Here is the modified file to be placed in the templates folder of the helm chart, as an example:
deployment.zip
Hello,
Here is the error I get when deploying on ICP 3.1.1 (no pb with ICP 2.x) due to the GPU management by K8s which changed with the latest versions of K8s. Symptom: POD scheduling error at helm install. With ICP 3.1 & 3.1.1 (and K8s version >= 11) nvidia.com/gpu should be used instead of alpha.kubernetes.io/nvidia-gpu, Here is a modified helm chart that works on my environment:
The critical part is in the Helm templates/deployment.yaml file in the requests/limits lines:
Here is the modified file to be placed in the templates folder of the helm chart, as an example: deployment.zip