Closed qw2208 closed 4 years ago
/cc @k82cn Just kindly ping:)
Hi Team,
I meet the same issue as well, and my scenario is:
I have a pod-group that contains 4 pods, all of which are 1cpu, 1GB memory
. Although my cluster only has 3.6cpu, these pods are scheduled to the nodes successfully, then the kubelet process on one node print the error message like above, and one pod fails to create.
However, if I apply for 4 * 1.5cpu, 1GB
, gang-scheduling will work and all pods are rejected to schedule
Reminder: in my scenario, I reserve 0.5cpu per node for system-use when init the k8s cluster, which means my VM has 8cpu, but only 7.5cpu is allocatable
(seems @qw2208 also has this configuration). Is this the root cause?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
Hi,
/kind bug
I've got the problem related to gang scheduling of kube-batch. When cpu request are close to the node limits, the gang scheduling
seems a failure
. One of the pods is not started and reported error by Kubelet while other pods are running. The pods running on the drought node were scheduled hours ago.See my following experiment as an example. I've created a podgroup and a batch of pods with the following configs: What I expect is that all the pods are pending and Failed to be scheduled. However, I got all the pods got scheduled but one of them gave the error of outofcpu: The node status is as following.
Another observation is: Seems if the gap is large, kube-batch pends all the pods, which meets the expectation