kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.04k stars 3.96k forks source link

Important : Autoscaler is not stable on load testing #800

Closed thaniyarasu closed 6 years ago

thaniyarasu commented 6 years ago

i have create a fresh cluster(for testing purpose) with aws provider. My Cluster info is kops(1.9.0),kubernetes(1.10.1),autoscaler(1.2.1),helm(v2.9.0-rc3). i installed autoscaler by "Auto-Discovery Setup" way by following this https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml i have enabled proper IAM role,policy,ASG tags, settings. it was seems after 5 min autoscaller start working by add/remove nodes.

at next day, i have installed 3 helm packages where each contain 12 deployments(each deploy contain only one container), ingress,hpa(min=1, max=3) . so the autoscaller start to add more nodes to support the deployments. and start handle the traffic load. all are meaning full, everything was fine. BUT, i have started a load testing by calling each of three application url. after a minute,within 60 sec my containers are not serving traffic. so i open dashboard. i was shocked , because most of the running container are rebooting or reallocating into another nodes so not reachable. after 2 mins every container start rebooting by showing the status "Creating Container",etc,etc.

so i tried to delete 2 helm charts, which make worse than the previous, this time all the container from "kube-system" namespace are rebooting. i see that a lot of kube-dsn,cluster-autoscaler container are not started from "kube-system" namespace , i saw those on dashboard. after 5 mins kubernetes api is not reachable,

so i have stopped load testing, left and came back after 2 hours , than i see that dashboard, api, all service are working fine. My point here is Auto scaler is not stable when more traffic reach cluster. sometimes "helm upgrade chart-name" also making autoscaler unstable.

aleksandra-malinowska commented 6 years ago

i was shocked , because most of the running container are rebooting or reallocating into another nodes so not reachable. after 2 mins every container start rebooting by showing the status "Creating Container",etc,etc.

Can you share requests, limits and actual resource usage of your containers during this load test?

so i tried to delete 2 helm charts, which make worse than the previous, this time all the container from "kube-system" namespace are rebooting. i see that a lot of kube-dsn,cluster-autoscaler container are not started from "kube-system" namespace , i saw those on dashboard.

Can you check what caused all of those containers to be rebooting at the same time? Those components either aren't autoscaled at all, or have their own autoscaling solution (kube-dns-autoscaler for kube-dns, and addon-resizer container for heapster/metrics-server). Additionally, Cluster Autoscaler by default doesn't remove nodes on which kube-system pods are running.

Were pods running on master node (kube-apiserver, kube-controller-manager) also restarting? Was Cluster Autoscaler running there, or somewhere else in the cluster?

thaniyarasu commented 6 years ago

i didn't specify requests, limits, resources on deployments. i have enabled autoscaler only on nodes. i have installed kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/monitoring-standalone/v1.7.0.yaml and kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/kubernetes-dashboard/v1.8.1.yaml

cluster autoscaler doesn't remove any nodes during load testing but it create new nodes (this is meaningful) yes ,pods running on master nodes also restarting, cluster autoscaler is not on master nodes

thaniyarasu commented 6 years ago

i think i have to consider this https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ i am going through this

aleksandra-malinowska commented 6 years ago

i didn't specify requests, limits, resources on deployments.

Setting requests corresponding to realistic resource usage is strongly recommended. In fact, scheduler always assumes some minimal request per container (to avoid placing unlimited number of pods without requests on a single node), but if you were load testing your app, it's likely the node became overloaded as the pods' resource usage spiked. You can try looking at Kubelet logs for restarts/errors to check if this was the case.

at next day, i have installed 3 helm packages where each contain 12 deployments(each deploy contain only one container), ingress,hpa(min=1, max=3) . so the autoscaller start to add more nodes to support the deployments. and start handle the traffic load.

cluster autoscaler doesn't remove any nodes during load testing but it create new nodes (this is meaningful)

Cluster Autoscaler should only add nodes if there are pods that can't be scheduled on existing nodes.

Another round of questions:

i think i have to consider this https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ i am going through this

Not sure how you're using this feature, but just to clarify:

If the system pods are getting preempted, I'd make sure they have a PriorityClass with higher value than any other workload in the system.

yes ,pods running on master nodes also restarting

Can you check the logs of affected components?

thaniyarasu commented 6 years ago

sorry for my late reply,

Setting requests corresponding to realistic resource usage is strongly recommended

Yes, here i have to set resource limits

what metric did you use for HPA?

i use HPA like this, mos

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: app-api-test
spec:
  scaleTargetRef:
    kind: Deployment
    name: app-api-test
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 80

if the pods had no requests, what was the reason they couldn't be scheduled on the nodes already present in the cluster? Are you using anti-affinity?

i use node selector, i leave other options to use default like this

    spec:
      nodeSelector:
        kubernetes.io/role: node

Can you check the logs of affected components?

let me first replicate the issue by installing a single chart, then i will check the logs.

now my question why does the pods under "kube-system" namespace is getting restarted. is there anyway to force to deploy all pods under "kube-system" namespace into only on master nodes. i had 3 master nodes.

Thanks for your help.

aleksandra-malinowska commented 6 years ago

i use node selector, i leave other options to use default like this

If node selector matches all of your non-master nodes, and requests weren't the reason pods couldn't be scheduled on existing nodes, I'm not sure why Cluster Autoscaler would scale up the cluster. If you manage to reproduce this, can you please share its log?

now my question why does the pods under "kube-system" namespace is getting restarted.

That's also my question. Logs from affected components, Kubelets and kube-controller-manager can all help.

is there anyway to force to deploy all pods under "kube-system" namespace into only on master nodes.

There are kube-system pods that must run on every node (kube-proxy.) For others, like kube-dns, running them within the cluster helps provide redundancy. For heapster/metrics-server, it keeps these components with high resource usage from slowing down the master. To run Cluster Autoscaler on a master node, you can add selector and toleration to the pod spec.

thaniyarasu commented 6 years ago

These changes did fix the issues. 1) adding resources(both requests and limits) resources: requests: cpu: 512m memory: 512Mi requests: cpu: 512m memory: 512Mi 2) upgrade heapster to latest k8s.gcr.io/heapster:v1.5.2 k8s.gcr.io/addon-resizer:2.1

Thanks to all