volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
4.19k stars 961 forks source link

Allocated status of queue is wrong, when deploy multi volcano scheduler with StatefulSet #3137

Open bfcx opened 1 year ago

bfcx commented 1 year ago

What happened: We deploy 3 volcano scheduler with StatefulSet, referring to the following documents: https://github.com/volcano-sh/volcano/blob/master/docs/design/deploy-multi-volcano-schedulers-without-using-selector.md

kubectl get pod -n volcano-system|grep volcano-scheduler
volcano-scheduler-0                    1/1     Running     0          20h
volcano-scheduler-1                    1/1     Running     0          20h
volcano-scheduler-2                    1/1     Running     0          20h

we submit 5 jobs into same queue, each job has 2 tasks(each task requests 1 cpu),so the allocated cpu of the queue will be 10. But the output of queue status is unexpected.

kubectl get vj -n ns-root|grep Running
job-169536577254725355276-root   Running     2              2          14m
job-169536577471915548993-root   Running     2              2          14m
job-169536577618095753595-root   Running     2              2          14m
job-169536577780397649277-root   Running     2              2          14m
job-169536577946553449829-root   Running     2              2          14m
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"scheduling.volcano.sh/v1beta1","kind":"Queue","metadata":{"annotations":{},"name":"pdcpu"},"spec":{"reclaimable":false,"weight":1}}
  creationTimestamp: "2023-09-21T09:18:35Z"
  generation: 3
  name: pdcpu
  resourceVersion: "266775645"
  uid: 5f9a7e8c-91dc-4481-b8ad-927b11806d6c
spec:
  reclaimable: false
  weight: 1
status:
  allocated:
    cpu: "4"
    memory: 4Gi
    nvidia.com/gpu: "0"
  reservation: {}
  running: 5
  state: Open

What you expected to happen: Get queue status by cmd several times:

kubectl get queue pdcpu -oyaml

The expected result:

Every output of execution: allocated: cpu: "10"

The actual result is:

output of the 1st execution: allocated: cpu: "2" output of the 2nd execution: allocated: cpu: "4" output of the 3rd execution: allocated: cpu: "4"

but, the running status is right on each execution: running: 5

How to reproduce it (as minimally and precisely as possible): 1. setup scheduler by the config

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: volcano-scheduler
  namespace: volcano-system
  labels:
    app: volcano-scheduler
spec:
  replicas: 3
  selector:
    matchLabels:
      app: volcano-scheduler
  serviceName: "volcano-scheduler"
  template:
    metadata:
      labels:
        app: volcano-scheduler
    spec:
      serviceAccount: volcano-scheduler
      containers:
      - name: volcano-scheduler
        args:
        - --logtostderr
        - --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf
        - --enable-healthz=true
        - --enable-metrics=true
        - --leader-elect=false
        - -v=3
        - 2>&1
        image: volcanosh/vc-scheduler:v1.8.0
        imagePullPolicy: IfNotPresent
        env:
        - name: MULTI_SCHEDULER_ENABLE
          value: "true"
        - name: SCHEDULER_NUM
          value: "3"
        - name: SCHEDULER_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        volumeMounts:
        - name: scheduler-config
          mountPath: /volcano.scheduler
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      serviceAccount: volcano-scheduler
      volumes:
      - name: scheduler-config
        configMap:
          name: volcano-scheduler-configmap
      nodeSelector:
        node-role.kubernetes.io/control-plane: ""
      tolerations:
      - effect: NoSchedule
        key: node.kubernetes.io/unschedulable
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
---
apiVersion: v1
kind: Service
metadata:
  name: volcano-scheduler
  labels:
    app: volcano-scheduler
spec:
  ports:
  - port: 80
    name: volcano-scheduler
  clusterIP: None
  selector:
    app: volcano-scheduler

2. setup a queue by the config

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"scheduling.volcano.sh/v1beta1","kind":"Queue","metadata":{"annotations":{},"name":"pdcpu"},"spec":{"reclaimable":false,"weight":1}}
  creationTimestamp: "2023-09-21T09:18:35Z"
  generation: 3
  name: pdcpu
  resourceVersion: "266780079"
  uid: 5f9a7e8c-91dc-4481-b8ad-927b11806d6c
spec:
  reclaimable: false
  weight: 1

3. submit some jobs request same CPU to the same queue 4. get the queue status by kubectl get queue

Anything else we need to know?:

Environment:

Monokaix commented 5 months ago

The doc you mentioned has some problem currently, you can see https://github.com/volcano-sh/volcano/blob/master/docs/design/multi-volcano-schedulers.md if you want to use multi schedulers: )