pangeo-data / pangeo-eosc

Pangeo for the European Open Science cloud
https://pangeo-data.github.io/pangeo-eosc/
MIT License
3 stars 3 forks source link

GPU resource not available? #74

Closed tinaok closed 2 months ago

tinaok commented 2 months ago

@sebastian-luna-valero We are not able to use GPU nodes today.

The event log gives us;

2024-05-02T13:52:58Z [Warning] 0/36 nodes are available: 3 node(s) were unschedulable, 33 Insufficient nvidia.com/gpu. preemption: 0/36 nodes are available: 1 Insufficient nvidia.com/gpu, 3 Preemption is not helpful for scheduling, 32 No preemption victims found for incoming pod..

Is there any problem on the Cluster?

スクリーンショット 2024-05-02 15 53 26

sebastian-luna-valero commented 2 months ago

Hi Tina,

Problem is that at the moment all GPUs are being used... sorry.

tinaok commented 2 months ago

Thank you @sebastian-luna-valero , but I do not see on Grafana interface any usage of GPU...

sebastian-luna-valero commented 2 months ago

Hi Tina,

This is where to look: image

The total amount of GPUs is shared with all other kubernetes users (i.e. users out of the Pangeo deployment)

sebastian-luna-valero commented 2 months ago

Considering this issue solved, but please reopen if needed.