jrasell / sherpa

Sherpa is a highly available, fast, and flexible horizontal job scaling for HashiCorp Nomad. It is capable of running in a number of different modes to suit different requirements, and can scale based on Nomad resource metrics or external sources.
Mozilla Public License 2.0
163 stars 8 forks source link

Fix incorrect searching of allocs causing missed allocs in jobs. #121

Closed jrasell closed 4 years ago

jrasell commented 4 years ago

When iterating a job that has multiple groups, 1 with and at least 1 without a scaling policy, the iteration was incorrectly ending the loop early rather than continuing the iteration. This resulted in scaling evaluations not correctly taking place and meaning jobs that needed to scale where not able to do so.

closes #120

jrasell commented 4 years ago

@commarla I believe this may be the cause of the issue!

commarla commented 4 years ago

Oh yeah 🎉 I am going to try it right now

commarla commented 4 years ago

I have built this branch ant tried it on my prod. Right after the restart, my faulty job began to scale out 🎉

Jan 03 09:32:20 admin-10-32-152-182 sherpa[20897]: {"level":"debug","job":"my-service","group":"my-service-service-spot","time":"2020-01-03T09:32:20.125500002+01:00","message":"triggering autoscaling job group evaluation"}
Jan 03 09:32:20 admin-10-32-152-182 sherpa[20897]: {"level":"info","job":"my-service","group":"my-service-service-spot","mem-value-percentage":37.79296875,"cpu-value-percentage":106.07861583333333,"time":"2020-01-03T09:32:20.125510519+01:00","message":"Nomad resource utilisation calculation"}
Jan 03 09:32:20 admin-10-32-152-182 sherpa[20897]: {"level":"debug","job":"my-service","time":"2020-01-03T09:32:20.125566755+01:00","message":"scaling evaluation completed, handling scaling request based on Nomad checks"}
Jan 03 09:32:20 admin-10-32-152-182 sherpa[20897]: {"level":"debug","job":"my-service","group":"my-service-service-spot","scaling-req":{"direction":"out","count":1,"group":"my-service-service-spot"},"time":"2020-01-03T09:32:20.125577160+01:00","message":"added group scaling request"}
Jan 03 09:32:20 admin-10-32-152-182 sherpa[20897]: {"level":"debug","status":"running","job":"my-service","time":"2020-01-03T09:32:20.174140012+01:00","message":"received deployment update message to handle"}

Thanks a lot !

jrasell commented 4 years ago

@commarla that is great news and sorry for the problems this caused. I'll get this merged and perform a bug fix release.

commarla commented 4 years ago

Don't be sorry @jrasell, I am glad you found the bug so fast.