For some reason CA is not scaling up, I always see longUnregistered values between 1-4, here are the logs and info:
kubectl get -n kube-system configmap cluster-autoscaler-status -o yaml
apiVersion: v1
data:
status: |+
Cluster-autoscaler status at 2023-11-24 10:25:10.316990961 +0000 UTC:
Cluster-wide:
Health: Healthy (ready=6 unready=0 notStarted=0 longNotStarted=0 registered=6 longUnregistered=1)
LastProbeTime: 2023-11-24 10:25:10.061622308 +0000 UTC m=+4514.931904993
LastTransitionTime: 2023-11-24 09:11:02.53461218 +0000 UTC m=+67.404894764
ScaleUp: InProgress (ready=6 registered=6)
LastProbeTime: 2023-11-24 10:25:10.061622308 +0000 UTC m=+4514.931904993
LastTransitionTime: 2023-11-24 10:04:34.898806071 +0000 UTC m=+3279.769088711
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2023-11-24 10:25:10.061622308 +0000 UTC m=+4514.931904993
LastTransitionTime: 2023-11-24 09:11:02.53461218 +0000 UTC m=+67.404894764
NodeGroups:
Name: nodes.production-cluster.k8s.local
Health: Healthy (ready=1 unready=0 notStarted=0 longNotStarted=0 registered=1 longUnregistered=1 cloudProviderTarget=5 (minSize=3, maxSize=12))
LastProbeTime: 2023-11-24 10:25:10.061622308 +0000 UTC m=+4514.931904993
LastTransitionTime: 2023-11-24 09:26:11.644902539 +0000 UTC m=+976.515185174
ScaleUp: InProgress (ready=1 cloudProviderTarget=5)
LastProbeTime: 2023-11-24 10:25:10.061622308 +0000 UTC m=+4514.931904993
LastTransitionTime: 2023-11-24 10:14:09.688949574 +0000 UTC m=+3854.559232154
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2023-11-24 10:25:10.061622308 +0000 UTC m=+4514.931904993
LastTransitionTime: 2023-11-24 09:11:02.53461218 +0000 UTC m=+67.404894764
Name: monitoring.production-cluster.k8s.local
Health: Healthy (ready=2 unready=0 notStarted=0 longNotStarted=0 registered=2 longUnregistered=0 cloudProviderTarget=2 (minSize=1, maxSize=3))
LastProbeTime: 2023-11-24 10:25:10.061622308 +0000 UTC m=+4514.931904993
LastTransitionTime: 2023-11-24 09:11:02.53461218 +0000 UTC m=+67.404894764
ScaleUp: Backoff (ready=2 cloudProviderTarget=2)
LastProbeTime: 2023-11-24 10:25:10.061622308 +0000 UTC m=+4514.931904993
LastTransitionTime: 2023-11-24 10:19:39.998068853 +0000 UTC m=+4184.868351539
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2023-11-24 10:25:10.061622308 +0000 UTC m=+4514.931904993
LastTransitionTime: 2023-11-24 09:11:02.53461218 +0000 UTC m=+67.404894764
pod logs for auto scaler:
E1124 10:28:04.696511 1 utils.go:60] pod.Status.StartTime is nil for pod happy-bunny-toolbox-1700787600-p9gzh. Should not reach here.
I1124 10:28:04.696557 1 filter_out_schedulable.go:125] Pod exiled-octopus-subscription-6dff8bd74-5mdbl marked as unschedulable can be scheduled on upcoming node template-node-for-nodes.production-cluster.k8s.local-5412385053388806738-1. Ignoring in scale up.
I1124 10:28:04.696577 1 filter_out_schedulable.go:125] Pod happy-bunny-toolbox-backup-1700787600-p9gzh marked as unschedulable can be scheduled on upcoming node template-node-for-nodes.production-cluster.k8s.local-5412385053388806738-0. Ignoring in scale up.
I1124 10:34:03.491670 1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"myservice", Name:"myservice-7b694dbd4b-ns4nr", UID:"15a3ca9c-4059-4b91-a92d-607fb31a4fb2", APIVersion:"v1", ResourceVersion:"741944579", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 in backoff after failed scale-up, 1 max node group size reached
another thing I noticed for pods which can't be deployed have this message:
Pod ingress-nginx-controller-b85b758cc-k87vw marked as unschedulable can be scheduled on upcoming node template-node-for-nodes.production-cluster.k8s.local-6801475017901952134-1. Ignoring in scale up.
and when I check logs again the node name is changed, in a minute
Pod ingress-nginx-controller-b85b758cc-k87vw marked as unschedulable can be scheduled on upcoming node template-node-for-nodes.production-cluster.k8s.local-5617487959455756375-0. Ignoring in scale up.
edit;
I am seeing these as well in logs:
Node group nodes.production-cluster.k8s.local is not ready for scaleup - backoff
Kind:"Pod", ... 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 in backoff after failed scale-up
...
For some reason CA is not scaling up, I always see
longUnregistered
values between 1-4, here are the logs and info:pod logs for auto scaler:
and deployment:
and some entries for various pods like this:
another thing I noticed for pods which can't be deployed have this message:
and when I check logs again the node name is changed, in a minute
edit;
I am seeing these as well in logs:
what can I do?