Open ls-2018 opened 7 months ago
I look forward to your reply, and I would be happy to work with you to solve this problem.
Can you exec kubectl describe pod-example1 -n default
, and give me the message abount why pod-example1 is unschedulable?
In your example, PodGroup A has configured scheduleTimeoutSeconds as 10, so in theory PodGroupA will be timeout after 10 seconds. However, in our current implementation, the timeout configuration of PodGroup just means the max wait time since first pod comes to permit stage, and won't be persisted as podgroup/pod status in apiserver and won't also block pod scheduling process. So would you like give me more detail message abount why pod is unschedulable.
Sorry, there is an error in the yaml I provided, I will fix it later and provide more information
➜ /Users/acejilam/Desktop/koordinator/bug/case git:(cn) ✗ 🐥 k delete -f .
podgroup.scheduling.sigs.k8s.io "a" deleted
pod "pod-example1" deleted
podgroup.scheduling.sigs.k8s.io "b" deleted
pod "pod-example2" deleted
➜ /Users/acejilam/Desktop/koordinator/bug/case git:(cn) ✗ 🐥 k apply -f 1.yaml
podgroup.scheduling.sigs.k8s.io/a created
pod/pod-example1 created
➜ /Users/acejilam/Desktop/koordinator/bug/case git:(cn) ✗ 🐥 date
Thu Mar 7 13:38:29 CST 2024
➜ /Users/acejilam/Desktop/koordinator/bug/case git:(cn) ✗ 🐥 sleep 2000 && k apply -f 1.yaml && date
^Z
[1] + 79592 suspended sleep 2000
➜ /Users/acejilam/Desktop/koordinator/bug/case git:(cn) ✗ 🐥 date
Thu Mar 7 14:06:39 CST 2024
➜ /Users/acejilam/Desktop/koordinator/bug/case git:(cn) ✗ 🐥 k apply -f 1.yaml && date
podgroup.scheduling.sigs.k8s.io/a unchanged
pod/pod-example1 unchanged
Thu Mar 7 14:06:46 CST 2024
➜ /Users/acejilam/Desktop/koordinator/bug/case git:(cn) ✗ 🐥 k apply -f 2.yaml && date
podgroup.scheduling.sigs.k8s.io/b created
pod/pod-example2 created
Thu Mar 7 14:06:55 CST 2024
➜ /Users/acejilam/Desktop/koordinator/bug/case git:(cn) ✗ 🐥 k describe pod pod-example1
Name: pod-example1
Namespace: default
Priority: 0
Service Account: default
Node: <none>
Labels: pod-group.scheduling.sigs.k8s.io=a
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
curlimage:
Image: busybox
Port: <none>
Host Port: <none>
Command:
sleep
365d
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c9nvx (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-c9nvx:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 13m koord-scheduler rejected due to timeout after waiting 15m0s at plugin Coscheduling
Warning FailedScheduling 13m koord-scheduler running PreFilter plugin "Coscheduling": %!!(MISSING)w(<nil>)
Warning FailedScheduling 8m16s koord-scheduler running PreFilter plugin "Coscheduling": %!!(MISSING)w(<nil>)
➜ /Users/acejilam/Desktop/koordinator/bug/case git:(cn) ✗ 🐥
As long as you sleep in between for a while, you can reproduce
/cc @ZiMengSheng
can you give me scheduler log abount why pod-example1 coscheduling prefilter failed, current Prefilter failed message is a little confusing due to known kube-scheduler bug.
i make a test and have got the point. PodGroup default/a has total number of 10 and min number of 1.
With totalChildrenNum's help, when the last pod comes to make all childrenScheduleRoundMap's values equal to scheduleCycle, Gang's scheduleCycle will be added by 1, which means a new schedule cycle.
In our example, pod-example1 gets rejected due to timeout of waiting PodGroupB. Scheduling cycle of pod-example1 is added to 1 after prefilter. When pod-example1 enter into scheduling cycle next time, gang scheduling cycle won't be added because num(child of which schedule cycle equals gang scheduling cycle) is one < totalChildrenNumber. thus Prefilter failed.
New schedule cycle will never arrive until you submit enough children of PodGroupA. So just submit all children?
@ZiMengSheng in goup/a I specified the number of minmembers to be 1. If I still need to increase the number of Pods, this is not consistent with my expectation.
OK, you opinion are right and welcome. There are some inconsistencies in the design. We need to fix it in the code and design doc. Do you have the time and interest to fix it?
I'd love to fix it. But I don't have a specific idea of how best to fix it. We also want to hear from the community.
I'd love to fix it. But I don't have a specific idea of how best to fix it. We also want to hear from the community.
Welcome to contribute! Just do it!
@ls-2018 any updates? ;)
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
1.yaml
2.yaml
Anything else we need to know?:
Environment:
kubectl version
):