jrasell / sherpa

Sherpa is a highly available, fast, and flexible horizontal job scaling for HashiCorp Nomad. It is capable of running in a number of different modes to suit different requirements, and can scale based on Nomad resource metrics or external sources.
Mozilla Public License 2.0
163 stars 8 forks source link

Scale in instead of out #84

Closed commarla closed 4 years ago

commarla commented 4 years ago

Describe the bug I have a strange behaviour I see a scale in instead of a scale out.

To reproduce My config is the following (I use nomad meta) :

 "Meta": {
        "sherpa_max_count": "15",
        "sherpa_cooldown": "180",
        "timestamp": "2019-10-30T13:23:10Z",
        "sherpa_scale_out_memory_percentage_threshold": "70",
        "sherpa_scale_out_cpu_percentage_threshold": "70",
        "sherpa_enabled": "1",
        "sherpa_min_count": "1",
        "sherpa_scale_out_count": "1",
        "sherpa_scale_in_cpu_percentage_threshold": "30",
        "sherpa_scale_in_memory_percentage_threshold": "30",
        "sherpa_scale_in_count": "1"
      },

In the log I have

Oct 31 08:56:31 admin-10-32-152-182 docker[23243]: {"level":"debug","job":"my-app","group":"my-app-main-spot","mem-usage-percentage":26,"cpu-usage-percentage":120,"time":"2019-10-31T07:56:31.222253696Z","message":"resource utilisation calculation"}
Oct 31 08:56:31 admin-10-32-152-182 docker[23243]: {"level":"debug","job":"my-app","scaling-req":{"direction":"in","count":1,"group":"my-app-main-spot"},"time":"2019-10-31T07:56:31.222325293Z","message":"added group scaling request"}
Oct 31 08:56:31 admin-10-32-152-182 docker[23243]: {"level":"info","job":"my-app","id":"35eafbc8-9946-4a1c-bcfe-6f1ec7394528","evaluation-id":"06d82281-1ba2-59ab-c9d5-10bd47fa527c","time":"2019-10-31T07:56:31.262742873Z","message":"successfully triggered autoscaling of job"}

With a CPU usage = 120% I should have a scale out and not a scale in. It is a conflict with my memory that is under 30% ?

Expected behavior A scale out

Environment:

commarla commented 4 years ago

Maybe it comes from this https://github.com/jrasell/sherpa/blob/master/pkg/autoscale/autoscale.go#L76

We may never test the scale out if there is a scale in. I think it is better to test the "out" before the "in" in the switch case to ensure the availability of the service.

jrasell commented 4 years ago

@commarla I think you're exactly right and I had this on my mind a few days ago to look into a figure out if it was a problem. I'll get right on to this; thanks for the detailed report!

The simple solution is to put the scale-out checks ahead of scale-in checks which would catch the situation. I think its important though for operators to understand if jobs have large differences in resource consumptions, so a slightly more complex solution would be warranted.

commarla commented 4 years ago

@jrasell thanks for the quick answer. I have build my own version to fix my use case. I understand you have to think about it to cover other cases. You can close my PR if you want.