koordinator-sh / koordinator

A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
https://koordinator.sh
Apache License 2.0
1.29k stars 320 forks source link

[BUG] The parent queue has over the maximum #2117

Open hiwangzhihui opened 2 months ago

hiwangzhihui commented 2 months ago

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

  1. create ns kubectl create ns namespace1 kubectl create ns namespace2

  2. create queue `apiVersion: scheduling.sigs.k8s.io/v1alpha1 kind: ElasticQuota metadata: name: root labels: quota.scheduling.koordinator.sh/is-parent: "true" quota.scheduling.koordinator.sh/allow-lent-resource: "false" spec: max: cpu: 2 memory: 2Gi min: cpu: 2 memory: 2Gi


kind: ElasticQuota metadata: name: a namespace: namespace1 labels: quota.scheduling.koordinator.sh/parent: "root" quota.scheduling.koordinator.sh/is-parent: "false" quota.scheduling.koordinator.sh/allow-lent-resource: "true" annotations: quota.scheduling.koordinator.sh/shared-weight: '{"cpu":"1","memory":"1Gi"}' spec: max: cpu: 2 memory: 2Gi min: cpu: 1 memory: 1Gi


apiVersion: scheduling.sigs.k8s.io/v1alpha1 kind: ElasticQuota metadata: name: b namespace: namespace2 labels: quota.scheduling.koordinator.sh/parent: "root" quota.scheduling.koordinator.sh/is-parent: "false" quota.scheduling.koordinator.sh/allow-lent-resource: "true" annotations: quota.scheduling.koordinator.sh/shared-weight: '{"cpu":"1","memory":"1Gi"}' spec: max: cpu: 2 memory: 2Gi min: cpu: 1 memory: 1Gi `

  1. Two pods submit to "a" queue

`apiVersion: v1 kind: Pod metadata: name: pod-a-1 namespace: namespace1 labels: quota.scheduling.koordinator.sh/name: "a" koordinator.sh/qosClass: BE spec: schedulerName: koord-scheduler priorityClassName: koord-batch containers:


apiVersion: v1 kind: Pod metadata: name: pod-a-2 namespace: namespace1 labels: quota.scheduling.koordinator.sh/name: "a" koordinator.sh/qosClass: BE spec: schedulerName: koord-scheduler priorityClassName: koord-batch containers:

  1. Two pods submit to "b" queue `apiVersion: v1 kind: Pod metadata: name: pod-b-1 namespace: namespace2 labels: quota.scheduling.koordinator.sh/name: "b" koordinator.sh/qosClass: LS spec: priorityClassName: koord-prod schedulerName: koord-scheduler containers:
    • command:
      • sleep
      • 365d image: nginx imagePullPolicy: IfNotPresent name: curlimage resources: limits: cpu: 1 memory: 1Gi requests: cpu: 1 memory: 1Gi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File restartPolicy: Always

apiVersion: v1 kind: Pod metadata: name: pod-b-2 namespace: namespace2 labels: quota.scheduling.koordinator.sh/name: "b" koordinator.sh/qosClass: LS spec: priorityClassName: koord-prod schedulerName: koord-scheduler containers:

Anything else we need to know?:

Environment:

saintube commented 1 month ago

@hiwangzhihui To recursively check the parent tree, please set enableCheckParentQuota to true in the pluginArgs.