The reason is here. If the node is deleted, then we didn't initialize the domainID, then we try to addUsage coming from workload in cache. I believe the simplest fix is to just skip adding usage for such domains. This will prevent panic and allow the workloads to schedule on existing nodes.
What happened:
Kueue crashes with panic when a new workload is scheduled after a node is deleted, which was hosting another workload.
What you expected to happen:
No panic, admission of new workloads continues.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
The reason is here. If the node is deleted, then we didn't initialize the domainID, then we try to addUsage coming from workload in cache. I believe the simplest fix is to just skip adding usage for such domains. This will prevent panic and allow the workloads to schedule on existing nodes.