Closed palmerabollo closed 5 years ago
It's a known issue with pod affinity / antiaffinity: https://github.com/kubernetes/autoscaler/issues/257#issuecomment-364449232. The details are on the issue I linked, but in general pod affinity and (especially) antiaffinity don't work well with CA. It can cause CA to only add nodes one by one as you observe and it completely breaks CA performance on large clusters. It's not easy to fix, because it's caused by pod affinity being implemented in a way that is conceptually incompatible with how autoscaler works. To fix it we'd need a significant refactor of either scheduler or autoscaler, neither of which is likely to happen soon.
Thanks @MaciekPytel. What I don't understand is why it works well on AWS. Shouldn't that logic be shared among all cloud implementations?
/assign
I think that cluster-autoscaler (CA) 1.3.x in Azure has problems dealing with affinity rules.
I use the following deployment to deploy a "pause" pod with two rules:
The agentpool "genmlow" uses Standard_DS2_v2 machines (8GB) in a virtual machine scale set.
When I scale the number of replicas to 10 (
kubectl scale deployment pause --replicas=10
), I see that the cluster autoscaler (version 1.3.9, k8s 1.11.8) creates only one node per iteration, as if it was ignoring the affinity rules. See cluster-autoscaler logs, where nodes go from 0->1->2->...->N.However, it only behaves this way when the pod has no requests. If I add the following requests:
Everything works as expected and the cluster autoscaler works as expected, creating the 10 virtual machines in a single batch (1->10). I guess it is because this time the autoscaler knows that it can not fit two pods in a single node (5Gi + 5Gi > 8GB), even if it still ignoring the affinity rules.
It looks like a bug to me. Using the same setup in AWS (cluster autoscaler 1.2.x instead of 1.3.x is the only difference) works fine, and the CA creates the 10 virtual machines no matter whether you specify the container memory requests or not.