You can allocate a 3 node RKE2 cluster when you only have one available vGPU

noahgildersleeve commented 6 months ago

Setup

Rancher version:v2.8-head
Rancher UI Extensions:
Browser type & version: Chrome Version 124.0.6367.78 Harvester Version: v1.3.0

Describe the bug

When you create a 3 node RKE2 cluster it only checks if you have one vGPU in the advanced even though it allocates the vGPUs to every VM. This means that if you do this then it will create YAML with the same vGPU for each. Only the first node will come up. The other two will show as unschedulable in Harvester and will loop on provisioning. To Reproduce

Set up vGPU profiles in Harvester
Only set up 1 profile
Import Harvester into Rancher
Create a new 3 node RKE2 cluster with Harvester as downstream provider
Make the RKE2 cluster have one vGPU assigned to it Result The first node will allocate and the others will be unschedulable. Rancher will start deleting and trying to reprovision them after the timeout Expected Result
You shouldn't be allowed to allocate vGPUs that don't exist

Screenshots

Greenshot 2024-04-29 16 22 10

Additional context

Found while testing https://github.com/rancher/dashboard/pull/10833 If you have enough vGPUs created with the same profile then this will probably work. For instance if you have 4 2Q profiles and then add them to a 3 node cluster it will probably allocate fine. I'm going to check this a bit later when testing resources free up.

torchiaf commented 6 months ago

/backport v2.8.next1

gaktive commented 6 months ago

/backport v2.8.next1

noahgildersleeve commented 6 months ago

Validated in Rancher v2.8-862f57beb6ff7caeab6b4e3c00c89912050cf317-head and Harvester v1.3.0.

rancher / dashboard

You can allocate a 3 node RKE2 cluster when you only have one available vGPU #10906