Open bfcx opened 1 year ago
What happened: We deploy 3 volcano scheduler with StatefulSet, referring to the following documents: https://github.com/volcano-sh/volcano/blob/master/docs/design/deploy-multi-volcano-schedulers-without-using-selector.md
kubectl get pod -n volcano-system|grep volcano-scheduler volcano-scheduler-0 1/1 Running 0 20h volcano-scheduler-1 1/1 Running 0 20h volcano-scheduler-2 1/1 Running 0 20h
we submit 5 jobs into same queue, each job has 2 tasks(each task requests 1 cpu),so the allocated cpu of the queue will be 10. But the output of queue status is unexpected.
kubectl get vj -n ns-root|grep Running job-169536577254725355276-root Running 2 2 14m job-169536577471915548993-root Running 2 2 14m job-169536577618095753595-root Running 2 2 14m job-169536577780397649277-root Running 2 2 14m job-169536577946553449829-root Running 2 2 14m
apiVersion: scheduling.volcano.sh/v1beta1 kind: Queue metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"scheduling.volcano.sh/v1beta1","kind":"Queue","metadata":{"annotations":{},"name":"pdcpu"},"spec":{"reclaimable":false,"weight":1}} creationTimestamp: "2023-09-21T09:18:35Z" generation: 3 name: pdcpu resourceVersion: "266775645" uid: 5f9a7e8c-91dc-4481-b8ad-927b11806d6c spec: reclaimable: false weight: 1 status: allocated: cpu: "4" memory: 4Gi nvidia.com/gpu: "0" reservation: {} running: 5 state: Open
What you expected to happen: Get queue status by cmd several times:
kubectl get queue pdcpu -oyaml
The expected result:
Every output of execution: allocated: cpu: "10"
The actual result is:
output of the 1st execution: allocated: cpu: "2" output of the 2nd execution: allocated: cpu: "4" output of the 3rd execution: allocated: cpu: "4"
but, the running status is right on each execution: running: 5
How to reproduce it (as minimally and precisely as possible): 1. setup scheduler by the config
apiVersion: apps/v1 kind: StatefulSet metadata: name: volcano-scheduler namespace: volcano-system labels: app: volcano-scheduler spec: replicas: 3 selector: matchLabels: app: volcano-scheduler serviceName: "volcano-scheduler" template: metadata: labels: app: volcano-scheduler spec: serviceAccount: volcano-scheduler containers: - name: volcano-scheduler args: - --logtostderr - --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf - --enable-healthz=true - --enable-metrics=true - --leader-elect=false - -v=3 - 2>&1 image: volcanosh/vc-scheduler:v1.8.0 imagePullPolicy: IfNotPresent env: - name: MULTI_SCHEDULER_ENABLE value: "true" - name: SCHEDULER_NUM value: "3" - name: SCHEDULER_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name volumeMounts: - name: scheduler-config mountPath: /volcano.scheduler priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler serviceAccount: volcano-scheduler volumes: - name: scheduler-config configMap: name: volcano-scheduler-configmap nodeSelector: node-role.kubernetes.io/control-plane: "" tolerations: - effect: NoSchedule key: node.kubernetes.io/unschedulable - effect: NoSchedule key: node-role.kubernetes.io/master - effect: NoSchedule key: node-role.kubernetes.io/control-plane --- apiVersion: v1 kind: Service metadata: name: volcano-scheduler labels: app: volcano-scheduler spec: ports: - port: 80 name: volcano-scheduler clusterIP: None selector: app: volcano-scheduler
2. setup a queue by the config
apiVersion: scheduling.volcano.sh/v1beta1 kind: Queue metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"scheduling.volcano.sh/v1beta1","kind":"Queue","metadata":{"annotations":{},"name":"pdcpu"},"spec":{"reclaimable":false,"weight":1}} creationTimestamp: "2023-09-21T09:18:35Z" generation: 3 name: pdcpu resourceVersion: "266780079" uid: 5f9a7e8c-91dc-4481-b8ad-927b11806d6c spec: reclaimable: false weight: 1
3. submit some jobs request same CPU to the same queue 4. get the queue status by kubectl get queue
Anything else we need to know?:
Environment:
kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:49:13Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:43:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
uname -a
Linux d2-hpc-master-01 4.18.16-1.el7.elrepo.x86_64 #1 SMP Sat Oct 20 12:52:50 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
The doc you mentioned has some problem currently, you can see https://github.com/volcano-sh/volcano/blob/master/docs/design/multi-volcano-schedulers.md if you want to use multi schedulers: )
What happened: We deploy 3 volcano scheduler with StatefulSet, referring to the following documents: https://github.com/volcano-sh/volcano/blob/master/docs/design/deploy-multi-volcano-schedulers-without-using-selector.md
we submit 5 jobs into same queue, each job has 2 tasks(each task requests 1 cpu),so the allocated cpu of the queue will be 10. But the output of queue status is unexpected.
What you expected to happen: Get queue status by cmd several times:
The expected result:
The actual result is:
but, the running status is right on each execution: running: 5
How to reproduce it (as minimally and precisely as possible): 1. setup scheduler by the config
2. setup a queue by the config
3. submit some jobs request same CPU to the same queue 4. get the queue status by kubectl get queue
Anything else we need to know?:
Environment:
kubectl version
):uname -a
):