rancher / dashboard

The Rancher UI
https://rancher.com
Apache License 2.0
460 stars 261 forks source link

You can allocate a 3 node RKE2 cluster when you only have one available vGPU #10906

Closed noahgildersleeve closed 6 months ago

noahgildersleeve commented 6 months ago

Setup

Describe the bug

When you create a 3 node RKE2 cluster it only checks if you have one vGPU in the advanced even though it allocates the vGPUs to every VM. This means that if you do this then it will create YAML with the same vGPU for each. Only the first node will come up. The other two will show as unschedulable in Harvester and will loop on provisioning. To Reproduce

  1. Set up vGPU profiles in Harvester
  2. Only set up 1 profile
  3. Import Harvester into Rancher
  4. Create a new 3 node RKE2 cluster with Harvester as downstream provider
  5. Make the RKE2 cluster have one vGPU assigned to it Result The first node will allocate and the others will be unschedulable. Rancher will start deleting and trying to reprovision them after the timeout Expected Result

    You shouldn't be allowed to allocate vGPUs that don't exist

Screenshots

Greenshot 2024-04-29 16 22 10

Additional context

Found while testing https://github.com/rancher/dashboard/pull/10833 If you have enough vGPUs created with the same profile then this will probably work. For instance if you have 4 2Q profiles and then add them to a 3 node cluster it will probably allocate fine. I'm going to check this a bit later when testing resources free up.

torchiaf commented 6 months ago

/backport v2.8.next1

gaktive commented 6 months ago

/backport v2.8.next1

noahgildersleeve commented 6 months ago

Validated in Rancher v2.8-862f57beb6ff7caeab6b4e3c00c89912050cf317-head and Harvester v1.3.0.