volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
4.1k stars 949 forks source link

fix:When queue resources are insufficient or about to be insufficient, instances cannot be generated. #3198

Closed LY-today closed 2 months ago

LY-today commented 10 months ago

What happened: When queue resources are insufficient or about to be insufficient, instances cannot be generated

What you expected to happen: Instances can also be generated when queue resources are insufficient or about to be insufficient.

How to reproduce it (as minimally and precisely as possible): When the currently allocated amount of a certain resource in the queue plus the resource application amount of the new task is greater than the upper limit of the resource configured in the queue, the phenomenon that the instance cannot be created can be stably reproduced.

Anything else we need to know?:

Environment:

Monokaix commented 10 months ago

Hi, what does the instance mean? pod or anything else?

LY-today commented 10 months ago

Hi, what does the instance mean? pod or anything else?

Hi,pod

Monokaix commented 10 months ago

Hi, what does the instance mean? pod or anything else?

Hi,pod

If queue has no sufficient resources, new task will not be scheduled, which I think it's normal: )

LY-today commented 10 months ago

Hi, what does the instance mean? pod or anything else?

Hi,pod

If queue has no sufficient resources, new task will not be scheduled, which I think it's normal: )

LY-today commented 10 months ago

Hi, what does the instance mean? pod or anything else?

Hi,pod

If queue has no sufficient resources, new task will not be scheduled, which I think it's normal: )

Maybe my description is wrong. It's not that it cannot be scheduled and pending appears, but that no instance is generated. This is what I think is unreasonable

LY-today commented 10 months ago

@Monokaix The core of the problem is not that instances cannot be scheduled and pending occurs when resources are scarce, but that no instances are created at all.

william-wang commented 10 months ago

@LY-today Did you configured the enqueue action in scheduler-configmap and enalbe the delay pod creation feature. Please add your scheduler configmap if possible. Here is the introduction of delay pod creation feature. https://github.com/volcano-sh/volcano/blob/master/docs/design/delay-pod-creation.md

LY-today commented 10 months ago

@LY-today您是否在 Scheduler-configmap 中配置了排队操作并启用了该delay pod creation功能。如果可能,请添加您的调度程序配置映射。这里是功能的介绍delay pod creationhttps://github.com/volcano-sh/volcano/blob/master/docs/design/delay-pod-creation.md

Thanks for your feedback, I tested it and found the solution

LY-today commented 10 months ago

@william-wang For this scenario, it can be solved if I directly close the enqueue action. I would like to ask whether other problems may be introduced after closing it? The impact of apiserver pressure and slow scheduling is acceptable. Are there other effects?

lowang-bh commented 10 months ago

The impact of apiserver pressure and slow scheduling is acceptable. Are there other effects?

Before release-1.6, if there is no enqueue action, podgroup will not be enqueue and job won't be scheduled.

After that version, it support scheduling without enqueue action. FYI: 91981bf48

william-wang commented 10 months ago

@william-wang For this scenario, it can be solved if I directly close the enqueue action. I would like to ask whether other problems may be introduced after closing it? The impact of apiserver pressure and slow scheduling is acceptable. Are there other effects?

@LY-today There is no other effects without enqueue.

LY-today commented 10 months ago

apiserver压力和调度缓慢的影响是可以接受的。还有其他影响吗?

在release-1.6之前,如果没有enqueue操作,podgroup将不会入队,作业也不会被调度。

该版本之后,支持无入队操作的调度。仅供参考:91981bf48

Thank you for your feedback

LY-today commented 10 months ago

@william-wang For this scenario, it can be solved if I directly close the enqueue action. I would like to ask whether other problems may be introduced after closing it? The impact of apiserver pressure and slow scheduling is acceptable. Are there other effects?

@LY-today There is no other effects without enqueue.

Thank you for your feedback

Monokaix commented 2 months ago

/close

volcano-sh-bot commented 2 months ago

@Monokaix: Closing this issue.

In response to [this](https://github.com/volcano-sh/volcano/issues/3198#issuecomment-2232224303): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.