volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
3.91k stars 901 forks source link

Delete or optimize namespace fairness #2762

Open wangyang0616 opened 1 year ago

wangyang0616 commented 1 year ago

What would you like to be added:

  1. If namespace fair scheduling is not used in actual business, can this function be deleted?

    Influence:

    • In a scenario where a queue contains multiple ns, if multiple vcjobs are submitted in one ns, it may cause the vcjobs of other ns in the queue to allocate less resources
    • In a scenario where a queue contains one ns, resources are scheduled according to the fairness strategy of the queue, which has no impact on existing functions

    advantage:

    • Reducing a layer of sorting logic before scheduling can improve scheduling efficiency
    • Solve the conflict between namespace fairness and vcjob priority scheduling
  2. If namespace fairness is frequently used in actual business and cannot be deleted, do we have a more suitable implementation method that can take into account ns fairness and resolve conflicts between the priorities of other vcjobs?

Why is this needed:

Volcano adds the concept of queue at the resource management level, and manages all vcjobs through the queue. There is a many-to-many relationship between queue and namespace, that is, there can be workloads submitted by multiple ns in one queue, and workloads in one ns can also be submitted to multiple queues.

Volcano currently supports the following three types of fair scheduling:

  1. Fair scheduling between jobs can prevent other vcjobs from starving to death due to too many tasks submitted by a certain vcjob
  2. The fairness between Namespaces can avoid submitting too many vcjobs in one ns, causing the vcjobs of other ns to starve to death
  3. The fairness strategy (Drf) between queues calculates the ratio of queue resource usage based on queue weight, capacity, etc., and sorts between queues to ensure that tasks between different queues have the right to share cluster resources fairly.

1 and 3 occupy a dominant position in the use process, and they are also functions that are often used in business. Is 2 used in actual business scenarios?

In addition, the implementation of the 2 function actually introduces some unnecessary sorting and functional exceptions, such as: #2747

stale[bot] commented 11 months ago

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).