Closed vincent-du2020 closed 2 years ago
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗
What happened: When we run a MPI Job like this, notice the Launcher and Worker have different "Resources/Requests:"
The Launcher Pod is always in the "Pending" state due to the resource fit failure, since the "My-Resource" only exists in the Nodes tainted for Worker Pods.
These are logs from 'volcano-scheduler' Pod
It seems the log statement in "proportion.go:299" does the calculation as this:
resource-request-from-worker * 1000 + resource-request-from-launcher then combine.
What you expected to happen: Launcher and Worker Pod should have separate resource requests. We had another cluster with an older version of Volcano (v1.1.2), this reported issue could not be reproduced.
How to reproduce it (as minimally and precisely as possible): set different resource requests in Launcher and Worker Specs, the Resource Request in the Worker gets applied to the Launcher Pod as well.
Anything else we need to know?: The actual hardware resource name is replaced with "My-Resource" here. We removed the "My-Resouce" Request from the Worker, both Launcher and Worker could be spawned on the node that is dedicated for the Launcher, and since this node does not have "My-Resource" the actual jobs on worker Pod would error out.
Environment:
kubectl version
): 1.22.5uname -a
): 5.4.0-100-generic #113-Ubuntu SMP