Closed armandocerna closed 3 years ago
We got the same problem.
Unable to create more pods Insufficient cpu
while all nodes are on \~5 - 10% cpu load 60-70% cpu limit (kubectl describe node
).
Restarting the Master Node seems to successfully schedule the pods.
Restart helped us as well. Is it something that's going to be fixed soon?
@armandocerna There are no sig labels on this issue. Please add a sig label by:
(1) mentioning a sig: @kubernetes/sig-<team-name>-misc
(2) specifying the label manually: /sig <label>
Note: method (1) will trigger a notification to the team. You can find the team list here.
We are having the same problem and cannot restart the master since we are in GKE.
/sig scheduling
How do you restart the nodes? I'm using Google Cloud platform... Would I SSH into the compute instances and restart?
I had the same problem today (on GKE), restarted my nodes from the console interface, when they came back online I re-deployed everything and it works now.
UPDATE: it only solved the issue for a short period of time. After a little while, my pods started going into an "Unknown" state and getting re-created. Still haven't solved it.
@flaviamissi I've found the high cpu instances to work better.
Thanks @chrissound, I have another cluster with high cpu instances and indeed they haven't shown any issues.
Not sure if related, but I recently got an email from GCP pointing to these docs https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/. GKE will start doing this after 1.7.6
upgrade.
Kube-dashboard says that only 0.05 cpu unit is being occupied. Why pod cannot be scheduled?
i just removed resources limit and request specs, it works for the present..
Another one can't be scheduled at this time again...
...
spec:
containers:
- name: default-http-backend
image: gcr.io/xxx/default-http-backend:latest
ports:
- containerPort: 8000
name: http
resources:
requests:
cpu: 10m
applying very little amount of resource made this work for this time.
Same issue on GKE, just added a fresh new instance (micro) but it won't schedule even the smallest pod on it. Eg:
Requests:
cpu: 1Mi
memory: 64Mi
...
26m 1s 95 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (5).
Though on the fresh node there is enough cpu available:
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
320m (34%) 100m (10%) 262Mi (44%) 414Mi (69%)
Other nodes are pretty packed at ~95% cpu allocated on each node, though even there it should schedule a 1m cpu pod.
We had this same issue: we could schedule pods with CPU resources initially, over time the master(?) seemed to fall out of sync and wouldn't schedule any pods with CPU resources despite describe nodes
reporting plenty of capacity - at which point, restarting the master temporarily restores functionality until it thinks the nodes are full again.
As part of coreOS updates, we bumped our etcd version to 3.2.9 and we are no longer seeing the problem: pods schedule happily when describe nodes
indicates capacity is sufficient; when pods cannot schedule, describe nodes
indicates that all the nodes do not have the capacity to handle the pod as requested. YMMV...
https://github.com/kubernetes/kubernetes/issues/45237#issuecomment-305654535 indicates that an analogous problem with memory resource was also solved by bumping etcd versions...
I might be seeing this as well (on GKE):
I have a deployment with:
[...]
resources:
requests:
cpu: 1
memory: 3G
[...]
resources:
requests:
cpu: 9G
memory: 52G
[...]
trying to deploy to a cluster that has 3 nodes with 15.89 CPU
allocatable and 57.65 GB
memory allocatable but getting Insufficient cpu (3), Insufficient memory (6)
for scheduling.
Doing stuff like bumping the second container down to:
requests:
cpu: 4G
memory: 22G
results in the same scheduling issue.
Hi,
the issue has been submitted in 2016. Any idea when it will be fixed? I have Openshift Origin 3.7 and this is killing me.....
Nothing?
Having a similar issue, there's almost 4 whole CPUs available within the cluster, a new Pod is requesting 500m
(half) & scheduler reports insufficient CPU in all nodes. 😱 Working with GKE, Kubernetes master version 1.9.2-gke.1.
If anyone could fork this repo, change the default allocation size for the CPU I would be eternally grateful. Seems like the most simple solution really (seeing as this it's been nearly 2 years with this issue). I suppose we would have to manage the cluster ourselves when it comes to GCP but oh well...
Same issue here. GKE. 1.9.2-gke.1.
Same issue, running 1.9.6-gke.1
+1
This issue is coming up on two years now, we're seeing this and its holding us back from going to production.
Seeing this too on k8s bare-metal
Notice that es-master.yaml
and es-client.yaml
, you can change replicas: 3
to replicas: 1
.
Did anyone resolve this issue?
seems a really old issue, any new logs for this issue?
This isn't really an 'issue' and I think we should close this question. See here for a further explanation: https://stackoverflow.com/a/45585916/1663462
There main issue is probably that it's not very intuitive why one gets the error message - even though it is the intended and correct behavior.
Why doesn't it trigger a scaleup then @chrissound?? Do I have to make sure that any pods I run on a given node request less than what the system pods have already taken up? Seems very inefficient.
Having the same issue here. I have many nodes with just 3% CPU utilization but for some reason new pods are not being allocated to them.
What's the version? any log ? The original issue is happened in 1.3.x, really old one.
I'd like to close this old issue, and please feel free to open a new one with more info, e.g. log, version :)
@k82cn
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 31s (x637 over 3h) default-scheduler No nodes are available that match all of the predicates: Insufficient cpu (7), Insufficient memory (3), PodToleratesNodeTaints (5)
My resources on one of my nodes (they are all pretty much the same)
Capacity:
cpu: 32
memory: 65690484Ki
pods: 110
Allocatable:
cpu: 32
memory: 65588084Ki
pods: 110
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
26817m (83%) 30617m (95%) 12784Mi (19%) 32756Mi (51%)
It doesn't make sense to use the POD Limit, which should kill my pod in case of memory leak or some abnormal CPU usage, to limit POD scheduling. If I have 100 pods with each of them have one limit, is very unlikely that they will all be running at its peak limit at the same time. Kubernetes should know about the used resources on my node for that.
So, either I have to create pods that will have veeery little requests and will not be able to handle any burst of requests for example, or I will waste a huge amount of resources because of it.
Also seeing this issue. We're on GKE.
My problem was caused by CPU limits at IAM & admin -> Quotas -> Compute Engine API (CPU), had to request for more resources available for my environment since upgraded limits my pods scale up easily.
I had similar issue, where pods failed scheduling due to insufficient CPU where all nodes are available. I found the issue with resource allocation. Resources were allocated in a manner that was exceeding the actual resources available for the cluster. Reallocated, resources properly and things worked fine.
Today I hit by this issue in Kube 1.10 on my local kubernetes environment.
pod was just in pending state. After describe pod I able to see following problem.
Warning FailedScheduling 21s (x8 over 1m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
Interesting thing I gave CPU is only .003 and it is very lightweight service. Previously it never failed.
@nsidhaye So, turns out my problem was: We had 30 cores to use, every deployment had a default request/limit set to 1 core request, 2 cores limit. Even tho the apps weren't consuming more than 50mi cpu, it was "locking" 1 whole core, meaning we hit the limit of 30 apps pretty quickly.
We had to redeploy most of our apps based on real resource consumption, using Prometheus/Grafana, we checked what is the average CPU consumption (and memory) for each pod, calculate how much it shoudl request and updated those values.
If you do a kubectl describe nodes
you should see how much resources you already requested by node, and it should point you in the correct direction of fixing your issue.
the same issue.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 38s (x8 over 1m) default-scheduler 0/25 nodes are available: 1 Insufficient pods, 1 PodToleratesNodeTaints, 23 Insufficient cpu.
we run pods with the same limit and request value of cpu(400m)/mem (1Gi), every node with 32core cpu, 128 mem, exclude the system resource part, we would run at least 60*24=1224 pod, but only runs the total of 906 pods. when pick one node to verify, it really does not meets the full resoure. (only 43%)
Name: 10.129.22.23
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=10.129.22.23
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Mon, 09 Jul 2018 17:17:28 +0800
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Thu, 12 Jul 2018 09:02:41 +0800 Mon, 09 Jul 2018 17:17:28 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Thu, 12 Jul 2018 09:02:41 +0800 Mon, 09 Jul 2018 17:17:28 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 12 Jul 2018 09:02:41 +0800 Mon, 09 Jul 2018 17:17:28 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Thu, 12 Jul 2018 09:02:41 +0800 Mon, 09 Jul 2018 17:17:59 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.129.22.23
Hostname: 10.129.22.23
Capacity:
cpu: 32
memory: 131911884Ki
pods: 110
Allocatable:
cpu: 32
memory: 131809484Ki
pods: 110
System Info:
Machine ID: 956ff818ca37414b8c43a31332f739bb
System UUID: 6b6cbe2a-5582-11e8-03ce-38adbed82897
Boot ID: ff7f7735-628a-4245-90cc-3fac5e82c436
Kernel Version: 4.17.4-1.el7.elrepo.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.3.0
Kubelet Version: v1.9.3
Kube-Proxy Version: v1.9.3
ExternalID: 10.129.22.23
Non-terminated Pods: (36 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default iperf-server-controller-n6zqf 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default kube-proxy-10.129.22.23 0 (0%) 0 (0%) 0 (0%) 0 (0%)
e2e-tests-clusterproject0-8kjzl nginx-pod-170 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject0-8kjzl nginx-pod-191 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject0-8kjzl nginx-pod-213 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject0-8kjzl nginx-pod-226 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject0-8kjzl nginx-pod-252 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject0-8kjzl nginx-pod-280 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject0-8kjzl nginx-pod-292 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject1-p24vs nginx-pod-1 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject1-p24vs nginx-pod-103 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject1-p24vs nginx-pod-112 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject1-p24vs nginx-pod-28 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject1-p24vs nginx-pod-47 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject1-p24vs nginx-pod-55 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject1-p24vs nginx-pod-71 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject1-p24vs nginx-pod-81 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject1-p24vs nginx-pod-90 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-100 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-117 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-133 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-142 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-158 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-163 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-178 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-203 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-225 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-244 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-25 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-255 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-260 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-58 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-80 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
e2e-tests-clusterproject2-qdpm5 nginx-pod-9 400m (1%) 400m (1%) 1Gi (0%) 1Gi (0%)
kube-system calico-node-9rnh6 250m (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system node-exporter-mdtfh 1 (3%) 1 (3%) 2Gi (1%) 2Gi (1%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
14050m (43%) 13800m (43%) 34Gi (27%) 34Gi (27%)
workaround by restart kube-scheduler, then everything goes ok...
I am seeing this issue with Azure Kubernetes Services at the moment and I have plenty of CPU as far as I can tell
Kubernetes version 1.9.9
The issue presented itself when I had to delete the pod on a config error with credentials I had to correct, testing to see if it happens again on the deployment from scratch
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 26m (x8 over 27m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
Non-terminated Pods: (12 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
awx awx-675dc9fdd5-cxk9z 4 (50%) 0 (0%) 6Gi (21%) 0 (0%)
awx tiller-deploy-656cdc5db6-rxhzr 0 (0%) 0 (0%) 0 (0%) 0 (0%)
awx unrealistic-buffoon-nginx-ingress-controller-7f9f7ccddb-l9v4d 0 (0%) 0 (0%) 0 (0%) 0 (0%)
awx unrealistic-buffoon-nginx-ingress-default-backend-bf7cf6d7pb7fq 0 (0%) 0 (0%) 0 (0%) 0 (0%)
awx yodeling-goose-cert-manager-8dcbcdc6b-krtqz 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system heapster-97b7d74b5-xbv24 138m (1%) 138m (1%) 294Mi (1%) 294Mi (1%)
kube-system kube-dns-v20-7d874cb9b6-v2pct 110m (1%) 0 (0%) 120Mi (0%) 220Mi (0%)
kube-system kube-dns-v20-7d874cb9b6-w6vpp 110m (1%) 0 (0%) 120Mi (0%) 220Mi (0%)
kube-system kube-proxy-qznhf 100m (1%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-svc-redirect-hr4jr 10m (0%) 0 (0%) 34Mi (0%) 0 (0%)
kube-system kubernetes-dashboard-7bb7584f55-vdnwp 100m (1%) 100m (1%) 50Mi (0%) 300Mi (1%)
kube-system tunnelfront-74f5d59895-hwfs9 10m (0%) 0 (0%) 64Mi (0%) 0 (0%)
Please let me know if I can provide any more logs if needed
Still having this issue... on GKE.
Same issue on GKE....
Even showing kube-system elements one of my nodes only uses 5% of the CPU resource: Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits
100m (5%) 0 (0%) 0 (0%) 0 (0%)
I had this same issue, GKE has a default LimitRange
with default limits for CPU request set to 100m
, this can be checked by running kubectl get limitrange -o=yaml -n default|your-namespace
.
This limit is applied to every container. So for instance, if you have a 4 cores node, and assuming that each pod created has 2 containers, it will allow only for around ~20 pods to be created, at least that was what I understood about it.
The workaround is to change the default limit by changing/removing the LimitRange
, and removing old pods so they are recreated with the new defaults, or specifically adding a different limit range to your pod config.
Some reading material: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#specify-a-cpu-request-and-a-cpu-limit https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/#create-a-limitrange-and-a-pod https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits
JCMais, thanks! It worked for me (I removed LimitRange)
Any news on this issue? Encountered this on AWS EKS when a rolling update is performed. Some pods stay in Pending state after rolling update.
@stmaute - you need to limit your pods to be allocated less CPU, depending on the machine it's running on.
hi @OlafSzmidt I have defined resource request and limit for my deployments. I have also set maxUnavailable=1 and maxSurge=0. I would expect that a rolling update will work because there are enough cpu resources available.
I'm also having this issue. Are there no solution or steps to debug this yet?
@KristianWindsor I will suggest you to describe the node you believe that should fit your request and check how it is. In my case, my mistake was to think that "requests == usage", so my nodes were using almost nothing of my resources but I made all the requests already. I believe that should be your case too.
I am using GKE, and I've noticed that I receive this error when reaching the CPU limits as described in @mbutan 's comment. Have a look at that limit (the default for us at this time is 8)
I'm seeing the exact same problem with Kubernetes 1.10.11, I wonder if I'm missing any of the settings the @mbutan mentioned(I could not find anything like that on AWS settings - maybe it's a GKE thing). My cluster runs on AWS(deployed using KOPS), pretty much a clean install. Restarting the kube-schedulers works for me as well, but it is a temporary solution as quickly it gets back to it's "problematic" mode. I checked the limitrange parameter as well and it is only set on the default namespace which I'm not using anyhow.
I think the reason there is no node cpu Requests can satisfied pod cpu request. use this command check all node cpu request , kubectl describe nodes {NodeName}
. if pod request cpu add current request cpu more than 100% , kube-scheduler will get a event "Kubernetes Pods not scheduled due to "Insufficient CPU" "
Kubernetes version (use
kubectl version
):Environment:
uname -a
): Master: 3.13.0-95-generic Minion: 4.4.0-38-genericWhat happened: When scheduling pods with a low resource request for CPU (15m) We recieve the message "Insufficient CPU" across all nodes attempting to schedule the pod. We are using multi container pods and running a describe pods shows nodes with available resources to schedule the pods. However k8s refuses to schedule across all nodes.
kubectl_output.txt
What you expected to happen:
How to reproduce it (as minimally and precisely as possible): Below is a sample manifest that we can use to produce the output.
manifest.txt
We end up scheduling pods up until about 10-14 pods and then we run into this problem. See graph below