[AWS] 'NotTriggerScaleUp' pod didn't trigger scale-up

okgolove commented 5 years ago

Common problem, but I wasn't able to find any solution. I have ASG, have auto-scaler

I deployed statefulset with application and then I got these errors:

auto-scaler-aws-cluster-autoscaler-5f78688568-6q4hn aws-cluster-autoscaler I1002 11:03:26.503432       1 utils.go:196] Pod myapp-server-0 can't be scheduled on nodes.prod.example.com, predicate failed: GeneralPredicates predicate mismatch, cannot put produ
ction/sportradar-uof-server-0 on template-node-for-nodes.prod.example.com-7085033211816415489, reason: node(s) didn't match node selector
auto-scaler-aws-cluster-autoscaler-5f78688568-6q4hn aws-cluster-autoscaler I1002 11:03:26.503455       1 scale_up.go:371] No pod can fit to nodes.prod.example.com
auto-scaler-aws-cluster-autoscaler-5f78688568-6q4hn aws-cluster-autoscaler I1002 11:03:26.503465       1 scale_up.go:376] No expansion options
auto-scaler-aws-cluster-autoscaler-5f78688568-6q4hn aws-cluster-autoscaler I1002 11:03:26.504202       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"production", Name:"myapp-server-0", UID:"7a8d1128-c62b-11e8-9f1f-066d51fa4852", APIVersion:"v1"
, ResourceVersion:"6206696", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added)

ASG status:


Cluster-autoscaler status at 2018-10-02 11:13:03.741699636 +0000 UTC:
Cluster-wide:
  Health:      Healthy (ready=8 unready=0 notStarted=0 longNotStarted=0 registered=8 longUnregistered=0)
               LastProbeTime:      2018-10-02 11:13:03.611386199 +0000 UTC m=+2086.611779690
               LastTransitionTime: 2018-10-02 10:38:44.809890287 +0000 UTC m=+27.810283749
  ScaleUp:     NoActivity (ready=8 registered=8)
               LastProbeTime:      2018-10-02 11:13:03.611386199 +0000 UTC m=+2086.611779690
               LastTransitionTime: 2018-10-02 10:38:44.809890287 +0000 UTC m=+27.810283749
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2018-10-02 11:13:03.611386199 +0000 UTC m=+2086.611779690
               LastTransitionTime: 2018-10-02 10:38:44.809890287 +0000 UTC m=+27.810283749

NodeGroups:
  Name:        nodes.prod.example.com
  Health:      Healthy (ready=5 unready=0 notStarted=0 longNotStarted=0 registered=5 longUnregistered=0 cloudProviderTarget=5 (minSize=0, maxSize=10))
               LastProbeTime:      2018-10-02 11:13:03.611386199 +0000 UTC m=+2086.611779690
               LastTransitionTime: 2018-10-02 10:38:44.809890287 +0000 UTC m=+27.810283749
  ScaleUp:     NoActivity (ready=5 cloudProviderTarget=5)
               LastProbeTime:      2018-10-02 11:13:03.611386199 +0000 UTC m=+2086.611779690
               LastTransitionTime: 2018-10-02 10:38:44.809890287 +0000 UTC m=+27.810283749
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2018-10-02 11:13:03.611386199 +0000 UTC m=+2086.611779690
               LastTransitionTime: 2018-10-02 10:38:44.809890287 +0000 UTC m=+27.810283749

MaciekPytel commented 5 years ago

Hi, Based on the log you posted (node(s) didn't match node selector) the pending pod has a nodeSelector or nodeAffinity. CA believes that the new node will not have the appropriate labels to match pod's requirements and so the pod won't schedule even if the node is added.

CA assumes that every node in a given ASG is exactly identical and that all node labels are added automatically upon node creation. The problem you see can be a result of manually changing the set of labels on A node or just a mismatch between pod and node config in your environment.

okgolove commented 5 years ago

@MaciekPytel thank you. Looks like I missed nodeSelector in my Helm values. Sorry for my inattention. I hope this issue will be useful for another people. Close for now.

kubernetes / autoscaler

[AWS] 'NotTriggerScaleUp' pod didn't trigger scale-up #1291