Closed jrasell closed 4 years ago
@commarla I believe this may be the cause of the issue!
Oh yeah 🎉 I am going to try it right now
I have built this branch ant tried it on my prod. Right after the restart, my faulty job began to scale out 🎉
Jan 03 09:32:20 admin-10-32-152-182 sherpa[20897]: {"level":"debug","job":"my-service","group":"my-service-service-spot","time":"2020-01-03T09:32:20.125500002+01:00","message":"triggering autoscaling job group evaluation"}
Jan 03 09:32:20 admin-10-32-152-182 sherpa[20897]: {"level":"info","job":"my-service","group":"my-service-service-spot","mem-value-percentage":37.79296875,"cpu-value-percentage":106.07861583333333,"time":"2020-01-03T09:32:20.125510519+01:00","message":"Nomad resource utilisation calculation"}
Jan 03 09:32:20 admin-10-32-152-182 sherpa[20897]: {"level":"debug","job":"my-service","time":"2020-01-03T09:32:20.125566755+01:00","message":"scaling evaluation completed, handling scaling request based on Nomad checks"}
Jan 03 09:32:20 admin-10-32-152-182 sherpa[20897]: {"level":"debug","job":"my-service","group":"my-service-service-spot","scaling-req":{"direction":"out","count":1,"group":"my-service-service-spot"},"time":"2020-01-03T09:32:20.125577160+01:00","message":"added group scaling request"}
Jan 03 09:32:20 admin-10-32-152-182 sherpa[20897]: {"level":"debug","status":"running","job":"my-service","time":"2020-01-03T09:32:20.174140012+01:00","message":"received deployment update message to handle"}
Thanks a lot !
@commarla that is great news and sorry for the problems this caused. I'll get this merged and perform a bug fix release.
Don't be sorry @jrasell, I am glad you found the bug so fast.
When iterating a job that has multiple groups, 1 with and at least 1 without a scaling policy, the iteration was incorrectly ending the loop early rather than continuing the iteration. This resulted in scaling evaluations not correctly taking place and meaning jobs that needed to scale where not able to do so.
closes #120