Open Rutori opened 2 years ago
It seems more like a nomad problem https://github.com/hashicorp/nomad/issues/11530 but it could also be happening because autoscaler applies several scaling actions at once - I wasn't being able to reproduce the bug with scaling API endpoint manually. I should also mention that my setup has some issues with connection to Nomad API, and I see in the logs that sometimes one scaling action gets nak'd and being retried again and again, and when it goes through - that's usually when allocations start to duplicate
Hi @Rutori 👋
The Autoscaler would just call the Nomad API, so it's kind of strange that you're seeing different behaviours. I haven't been able to reproduce it locally, so do you have any logs from the Autoscaler that you could share?
While scaling, autoscaler places allocations even after error and doesn't account for instances that are already starting which can lead to overflowing clients with pending instances
Reproduction steps
Expected Result
Nomad accounts for already placed allocations including those that had already started and does not scale when there's already a deployment.
Actual Result Even though autoscaler correctly throws error that there's already a deployment, the new allocations are still being placed. Count returned by a strategy plugin is being compared only to started allocations, and as a result autoscaler may place more allocations than a maximum configured amount