rancher / dashboard

The Rancher UI
https://rancher.com
Apache License 2.0
447 stars 251 forks source link

[backport v2.8.next1] You can allocate a 3 node RKE2 cluster when you only have one available vGPU #10912

Closed github-actions[bot] closed 3 months ago

github-actions[bot] commented 4 months ago

This is a backport issue for #10906, automatically created via GitHub Actions workflow initiated by @gaktive

Original issue body:

Setup

Describe the bug

When you create a 3 node RKE2 cluster it only checks if you have one vGPU in the advanced even though it allocates the vGPUs to every VM. This means that if you do this then it will create YAML with the same vGPU for each. Only the first node will come up. The other two will show as unschedulable in Harvester and will loop on provisioning. To Reproduce

  1. Set up vGPU profiles in Harvester
  2. Only set up 1 profile
  3. Import Harvester into Rancher
  4. Create a new 3 node RKE2 cluster with Harvester as downstream provider
  5. Make the RKE2 cluster have one vGPU assigned to it Result The first node will allocate and the others will be unschedulable. Rancher will start deleting and trying to reprovision them after the timeout Expected Result

    You shouldn't be allowed to allocate vGPUs that don't exist

Screenshots

Greenshot 2024-04-29 16 22 10

Additional context

Found while testing https://github.com/rancher/dashboard/pull/10833 If you have enough vGPUs created with the same profile then this will probably work. For instance if you have 4 2Q profiles and then add them to a 3 node cluster it will probably allocate fine. I'm going to check this a bit later when testing resources free up.

noahgildersleeve commented 3 months ago

Validated in Rancher v2.8-862f57beb6ff7caeab6b4e3c00c89912050cf317-head and Harvester v1.3.0.

Greenshot 2024-05-07 17 41 21 Greenshot 2024-05-07 17 41 11 Greenshot 2024-05-07 17 40 57 Greenshot 2024-05-07 17 40 19