input-output-hk / bitte

Nix Ops for Terraform, Consul, Vault, Nomad
Apache License 2.0
155 stars 15 forks source link

Nomad autoscaling #6

Open manveru opened 3 years ago

manveru commented 3 years ago

In order to actually make better use of our new clusters we should take advantage of autoscaling the AWS autoscaling groups. Right now they are hardcoded using the terraform core workspaces.

My proposed approach is to run an instance of the nomad autoscaling daemon on the monitoring server since it will have access to metrics across the cluster already. I haven't had time to look into this yet, so I'm not able to determine what kind of configuration this will require, but the core instances will most likely need additional IAM privileges for controlling the amount of instances in the groups. Those are still in the per-cluster iam.nix file. (which we should probably pull into bitte proper).

See https://github.com/hashicorp/nomad-autoscaler

jonringer commented 3 years ago

I think most of this has been satisfied

cc @nrdxp

nrdxp commented 3 years ago

Looks like a lot of these old issue are pretty stale :sweat_smile:

I checked what should already be done. Moving the instance to monitoring may actually be a good idea since it can pull the data locally instead of over the network, but I'm not sure if that would actually change anything substantially. I dunno if anyone has gotten around to fixing the AMI since the recent ZFS breaks though.

Right now we are only using CPU and memory to determine scaling actions so there is still potential to improve this aspect. And I would probably add another point: