terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources πŸ‡ΊπŸ‡¦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.43k stars 4.06k forks source link

Add `autoscaling_enabled` to example files #578

Closed wperron closed 4 years ago

wperron commented 4 years ago

I have issues

The examples configurations do not currently include the autoscaling enabled parameter even though they are including other autoscaling-related parameters. This is misleading for first-time users as they are likely to use those examples as-is, and then will encounter an issue where their cluster won't be scaling as the number of pods increases.

I'm submitting a...

What's the proposed solution?

I suggest adding autoscaling_enabled = "true" to the example configurations to make them behave as one would expect to. I also suggest maybe linking the local.tf file in the README so that the default values are easier to find.

Environment details

max-rocket-internet commented 4 years ago

even though they are including other autoscaling-related parameters

Where are these related parameters included?

FYI there's a link in the main README: Autoscaling: How to enable worker node autoscaling.

wperron commented 4 years ago

@max-rocket-internet I'm talking specifically about the example configurations in examples/

for instance, in the basic/main.tf there's this worker group configuration which includes asg_desired_capacity :

  worker_groups = [
    {
      name                          = "worker-group-1"
      instance_type                 = "t2.small"
      additional_userdata           = "echo foo bar"
      asg_desired_capacity          = 2
      additional_security_group_ids = [aws_security_group.worker_group_mgmt_one.id]
    },
    {
      name                          = "worker-group-2"
      instance_type                 = "t2.medium"
      additional_userdata           = "echo foo bar"
      additional_security_group_ids = [aws_security_group.worker_group_mgmt_two.id]
      asg_desired_capacity          = 1
    },
  ]

That will create the Autoscaling Group with the desired configuration in AWS, but it will initialize add the k8s.io/cluster-autoscaler/disabled = true annotation instead of k8s.io/cluster-autoscaler/enabled = true which means the cluster won't be able to autoscale even if the autoscaler pod is installed. Adding autoscaling_enabled to the example configurations would just make it clearer that there's an extra parameter needed for the autoscaling to work.

Concerning the link in the README, I was suggesting adding a link to the local.tf. When I was trying to figure out why my cluster wasn't scaling, that's where I ended up and saw the different default values, so I think it could be useful for other people.

The link you mentioned is also super useful but its not what I was referring to πŸ˜‰

max-rocket-internet commented 4 years ago

Sorry but I still don't follow πŸ˜…

there's this worker group configuration which includes asg_desired_capacity

That's a setting of the AWS ASG but it is not related to k8s autoscaling. Autoscaling from an ASG and normal k8s autoscaling are separate things. There's no implication that the examples allow autoscaling in k8s.

max-rocket-internet commented 4 years ago

When I was trying to figure out why my cluster wasn't scaling, that's where I ended up and saw the different default values, so I think it could be useful for other people.

I'd rather not set autoscaling_enabled in the examples as it might imply to people that there's autoscaling out of the box when there isn't. I think this is what you figured? You must install the cluster-autoscaler and this is covered in the doc.

So rather than setting autoscaling_enabled in the examples, is there some other way that we could make it clearer?

wperron commented 4 years ago

Sorry my mistake; yes I meant the ASG autoscaling. For me seeing an autoscaling_enabled = true would do it, as it implies that it can also be set to false, though I can also see somebody not making that connexion. I guess we could leave the existing examples, and one or two this parameter is explicitly set to true and/or false.

What do you think?

max-rocket-internet commented 4 years ago

Sorry my mistake; yes I meant the ASG autoscaling

Ah all good. If you're not super familiar with how autoscaling works in k8s on AWS then I can see it might be confusing.

To be clear: it is not possible or advisable to scale using the ASG itself. ASG autoscaling needs scaling policies (that are not part of this module) and also the ASG doesn't properly drain nodes when terminating them. So it's not really possible to use ASG based autoscaling in a reliable way. Using the cluster-autoscaler is the best way.

I think the doc and the note here is clear about what this parameter does. It adds and tag to the ASG and a policy to the instance role. Nothing more.

If you want to create a PR to add a note in the doc that emphasises the distinction, feel free πŸ™‚

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

barryib commented 4 years ago

Resolved by https://github.com/terraform-aws-modules/terraform-aws-eks/pull/716 and https://github.com/terraform-aws-modules/terraform-aws-eks/pull/710

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.