Closed nicklhy closed 3 years ago
+1
Well, perhaps it's reasonable as what you think because you were thinking at the view of human beings, not a scheduler.When resolve the problem, you have set the confirmed input job set : job X and job Y. So you can try to find the best solution to match them to nodes. But in the view of scheduler like Volcano, it deals with all jobs in turn. It can only find the best match and allocation for the current job according to the current condition of the cluster regardless of jobs to be scheduled in the future, just like a workflow. It can not predicate what kinds of jobs will come in the future.
@Thor-wl Thanks for your reply. But I guess you misunderstood the problem here. I am not complaining about volcano failed to schedule two different jobs. Instead, I am complaining the scheduling policy of a single job with multiple pods, which is exactly the core problem (also known as gang-scheduling of a podgroup?) that volcano aims to solve.
I think it is better to schedule a job which includes multiple pods globally, rather than using a one-by-one greedy method.
PS. I used the term "task" because volcano use this word in the CRD: https://github.com/volcano-sh/volcano/blob/da8df888c412814d3574c501bc7c0b31a828b68e/pkg/apis/batch/v1alpha1/job.go#L64
Well, perhaps it's reasonable as what you think because you were thinking at the view of human beings, not a scheduler.When resolve the problem, you have set the confirmed input job set : job X and job Y. So you can try to find the best solution to match them to nodes. But in the view of scheduler like Volcano, it deals with all jobs in turn. It can only find the best match and allocation for the current job according to the current condition of the cluster regardless of jobs to be scheduled in the future, just like a workflow. It can not predicate what kinds of jobs will come in the future.
Yes, I understand the reason of design of one-by-one, but there is a difference in job and jobs. Scheduler should not predicate next job, so it work fine in the case of [job just include 1 task] , but if a batch job include 2 or more task, it would be a random box that we do not know what will happen, I think vlocano can use some algorithm like DP or recursion to improve the performence of that case
@Thor-wl Thanks for your reply. But I guess you misunderstood the problem here. I am not complaining about volcano failed to schedule two different jobs. Instead, I am complaining the scheduling policy of a single job with multiple pods, which is exactly the core problem (also known as gang-scheduling of a podgroup?) that volcano aims to solve.
I think it is better to schedule a job which includes multiple pods globally, rather than using a one-by-one greedy method.
PS. I used the term "task" because volcano use this word in the CRD: https://github.com/volcano-sh/volcano/blob/da8df888c412814d3574c501bc7c0b31a828b68e/pkg/apis/batch/v1alpha1/job.go#L64
Oh, yes, I'm sorry for misunderstanding the question. It does an optimization point when scheduling a job with multiple pods. Do you have some good ideas or alogrithms for that?
I think vlocano can use some algorithm like DP or recursion to improve the performence of that case
That's interesting! Can you submit a design doc for that?
@Thor-wl
I think DP is a suitable algorithm. This is a standard question of 0-1 Knapsack.
I think vlocano can use some algorithm like DP or recursion to improve the performence of that case
That's interesting! Can you submit a design doc for that?
Sure, I need some time to prepare that.
I think vlocano can use some algorithm like DP or recursion to improve the performence of that case
That's interesting! Can you submit a design doc for that?
Sure, I need some time to prepare that.
Great! I'm looking forward to that.
That's a NP problem for multi-dimension resources; anyway, that's great to have some algorithm to improve it.
That's a NP problem for multi-dimension resource;
Yes, you are correct, it's a NP problem, more complicated than multi knapsack problem. Past days I tried to solve this problem through DP algorithm, but failed.
Volcano has some logic such as nodeorder, priority. I think if we want to solve this problem in polynomial time, we couldn't calculate every probability.
So we can redifine the problem using the current policy as step1. When a batch task failed to schedule, we can use a greedy policy (use greedy algorithm to find a feasible solution, but not necessarily optimal) as step2. That means schedule failed = both step1 and step2 failed. And we can set this greedy algorithm as a plugin logic. @k82cn, @Thor-wl , Do you think my opnion is feasible?
that's ok to me :)
ok, I'll write code flow the policy above.
ok, I'll write code flow the policy above.
Would you help to draft a design doc for that? That'll help on the discussion :)
ok, I'll write code flow the policy above.
Would you help to draft a design doc for that? That'll help on the discussion :)
Ok, No problem.
Any process for this feature and design? @ningfdx
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗
I also need that QAQ
Is this a BUG REPORT or FEATURE REQUEST?: /kind bug
What happened: There are two nodes in my cluster: A and B. The current idle resources of them are 53 cpu + 476GB memory and 31 cpu + 96GB memory respectively. Now, I want to submit a distributed job with two tasks: X needs 30 cpu + 1GB memory, Y needs 50 cpu + 1GB memory. Theoretically, we can schedule task X on node B and task Y on node A. However, If I submit this job repeatedly (without any config change), sometimes it succeeded and sometimes it failed.
The log of volcano-scheduler shows that it first tried to bind task X on node A and then there is no place for task Y. It didn't consider the other choice, binding task X on node B.
What you expected to happen:
This job should always be scheduled successfully.
How to reproduce it (as minimally and precisely as possible):
Try to submit a task as above for multiple times.
Anything else we need to know?:
I looked into volcano's source code about the job scheduling policy here:
https://github.com/volcano-sh/volcano/blob/da8df888c412814d3574c501bc7c0b31a828b68e/pkg/scheduler/actions/allocate/allocate.go#L198-L203
It seems volcano tries to schedule each task one-by-one like a stack. I guess this should be the reason of this problem.
Environment:
kubectl version
): 1.15.5