sassoftware / viya4-deployment

This project contains Ansible code that creates a baseline in an existing Kubernetes environment for use with the SAS Viya Platform, generates the manifest for an order, and then can also deploy that order into the Kubernetes environment specified.
Apache License 2.0
71 stars 64 forks source link

Pods are crasing #142

Closed venu-ibex-9 closed 3 years ago

venu-ibex-9 commented 3 years ago

I am getting this below issue, can anyone help me with this ?

I0906 11:22:39.366127 1 scale_up.go:288] Pod sas-cas-control-86b98f77c5-58zs2 can't be scheduled on sas-test-eks-default20210904073839227900000022, predicate checking error: node(s) didn't match Pod's node affinity; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity; debugInfo= I0906 11:22:39.366160 1 scale_up.go:288] Pod sas-report-execution-cb79b77f5-wc5g6 can't be scheduled on sas-test-eks-default20210904073839227900000022, predicate checking error: node(s) didn't match Pod's node affinity; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity; debugInfo= I0906 11:22:39.366191 1 scale_up.go:288] Pod sas-analytics-services-648bcdd75c-vjjkh can't be scheduled on sas-test-eks-default20210904073839227900000022, predicate checking error: node(s) didn't match Pod's node affinity; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity; debugInfo= I0906 11:22:39.366221 1 scale_up.go:288] Pod sas-environment-manager-app-777c954694-677xh can't be scheduled on sas-test-eks-default20210904073839227900000022, predicate checking error: node(s) didn't match Pod's node affinity; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity; debugInfo=

I am also getting this below issue, I am assuming it's due to low instance size, please confirm ?

e group min size reached I0906 11:22:39.372131 1 pre_filtering_processor.go:66] Skipping ip-192-168-97-161.ec2.internal - node group min size reached I0906 11:22:39.372172 1 scale_down.go:423] Node ip-192-168-37-22.ec2.internal is not suitable for removal - memory utilization too big (0.896766) I0906 11:22:39.373612 1 scale_down.go:423] Node ip-192-168-59-139.ec2.internal is not suitable for removal - memory utilization too big (0.908980) I0906 11:22:39.373660 1 scale_down.go:423] Node ip-192-168-103-249.ec2.internal is not suitable for removal - memory utilization too big (0.932484) I0906 11:22:39.373680 1 scale_down.go:423] Node ip-192-168-64-129.ec2.internal is not suitable for removal - memory utilization too big (0.999073) I0906 11:22:39.373711 1 scale_down.go:423] Node ip-192-168-7-5.ec2.internal is not suitable for removal - memory utilization too big (0.949596) I0906 11:22:39.373742 1 scale_down.go:423] Node ip-192-168-7-192.ec2.internal is not suitable for removal - memory utilization too big (0.931517) I0906 11:22:39.373770 1 scale_down.go:423] Node ip-192-168-61-114.ec2.internal is not suitable for removal - memory utilization too big (0.925616) I0906 11:22:39.373801 1 scale_down.go:423] Node ip-192-168-76-100.ec2.internal is not suitable for removal - memory utilization too big (0.954067) I0906 11:22:39.373817 1 scale_down.go:488] Scale-down calculation: ignoring 2 nodes unremovable in the last 5m0s I0906 11:22:39.373861 1 static_autoscaler.go:503] Scale down status: unneededOnly=false lastScaleUpTime=2021-09-04 17:54:54.722674816 +0000 UTC m=+21279.121544295 lastScaleDownDeleteTime=2021-09-04 12:00:18.617194618 +0000 UTC m=+3.016064074 lastScaleDownFailTime=2021-09-04 12:00:18.617194701 +0000 UTC m=+3.016064161 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false I0906 11:22:39.373892 1 static_autoscaler.go:516] Starting scale down I0906 11:22:39.373961 1 scale_down.go:868] No candidates for scale down

enderm commented 3 years ago

You are not giving much information here. I also do not see a connection to this particular github projec.t Looks like the deployment process itself was successful. So the errors you are seeing are post-deployment. This looks more like a question for SAS Tech Support.

Going forward, there are numerous guides out there on how to write good github issues, e.g. https://github.com/codeforamerica/howto/blob/master/Good-GitHub-Issues.md

When in doubt, use common sense:

I have found very often that going through the steps of describing an issue so that others can understand it will actually lead me to a solution, or to obvious further questions that I can investigate that might eventually lead to an answer.

That said, the error messages you are seeing point to a possible issue with the taints on your nodepools, and also a possible issue with nodepool sizing. So those are good area to look into further.

venu-ibex-9 commented 3 years ago

Sure I got it, next time will follow the above Github link to raise an issue, We are reaching to tech support