Scale in instead of out

commarla commented 4 years ago

Describe the bug I have a strange behaviour I see a scale in instead of a scale out.

To reproduce My config is the following (I use nomad meta) :

 "Meta": {
        "sherpa_max_count": "15",
        "sherpa_cooldown": "180",
        "timestamp": "2019-10-30T13:23:10Z",
        "sherpa_scale_out_memory_percentage_threshold": "70",
        "sherpa_scale_out_cpu_percentage_threshold": "70",
        "sherpa_enabled": "1",
        "sherpa_min_count": "1",
        "sherpa_scale_out_count": "1",
        "sherpa_scale_in_cpu_percentage_threshold": "30",
        "sherpa_scale_in_memory_percentage_threshold": "30",
        "sherpa_scale_in_count": "1"
      },

In the log I have

Oct 31 08:56:31 admin-10-32-152-182 docker[23243]: {"level":"debug","job":"my-app","group":"my-app-main-spot","mem-usage-percentage":26,"cpu-usage-percentage":120,"time":"2019-10-31T07:56:31.222253696Z","message":"resource utilisation calculation"}
Oct 31 08:56:31 admin-10-32-152-182 docker[23243]: {"level":"debug","job":"my-app","scaling-req":{"direction":"in","count":1,"group":"my-app-main-spot"},"time":"2019-10-31T07:56:31.222325293Z","message":"added group scaling request"}
Oct 31 08:56:31 admin-10-32-152-182 docker[23243]: {"level":"info","job":"my-app","id":"35eafbc8-9946-4a1c-bcfe-6f1ec7394528","evaluation-id":"06d82281-1ba2-59ab-c9d5-10bd47fa527c","time":"2019-10-31T07:56:31.262742873Z","message":"successfully triggered autoscaling of job"}

With a CPU usage = 120% I should have a scale out and not a scale in. It is a conflict with my memory that is under 30% ?

Expected behavior A scale out

Environment:

Sherpa server information (retrieve with sherpa system info):

/usr/bin # sherpa system info
Nomad Address                http://xxxxxxxx.eu-central-1.elb.amazonaws.com:4646
Policy Engine                Nomad Job Group Meta
Storage Backend              Consul
Internal AutoScaling Engine  true
Strict Policy Checking       true

Sherpa CLI version (retrieve with sherpa --version): docker image jrasell/sherpa:0.2.1

/usr/bin # sherpa --version
sherpa version v0.2.1
    Date:   2019-10-31 08:07:37.880961463 +0000 UTC
    Commit: be871e9
    Branch: v0.2.1
    State:  v0.2.1

Server Operating System/Architecture: Docker 19.03.2 Debian strech 9.11 Linux sherpa 4.19.0-0.bpo.6-amd64 SMP Debian 4.19.67-2+deb10u1~bpo9+1 (2019-09-30) x86_64 Linux

Sherpa server configuration parameters:

SHERPA_AUTOSCALER_ENABLED=true
SHERPA_AUTOSCALER_EVALUATION_INTERVAL=60
SHERPA_AUTOSCALER_NUM_THREADS=3
SHERPA_BIND_ADDR=0.0.0.0
SHERPA_BIND_PORT=8000
SHERPA_CLUSTER_ADVERTISE_ADDR=http://127.0.0.1:8000
SHERPA_CLUSTER_NAME=prod-main-admin
SHERPA_LOG_FORMAT=auto
SHERPA_LOG_LEVEL=debug
SHERPA_LOG_USE_COLOR=true
SHERPA_POLICY_ENGINE_API_ENABLED=false
SHERPA_POLICY_ENGINE_NOMAD_META_ENABLED=true
SHERPA_POLICY_ENGINE_STRICT_CHECKING_ENABLED=true
SHERPA_STORAGE_CONSUL_ENABLED=true
SHERPA_STORAGE_CONSUL_PATH=sherpa/
SHERPA_TELEMETRY_STATSD_ADDRESS=
SHERPA_TELEMETRY_STATSITE_ADDRESS=
SHERPA_TLS_CERT_KEY_PATH=
SHERPA_TLS_CERT_PATH=
SHERPA_UI=false

Nomad client configuration parameters (if any): There is nothing specific in my nomad config
Consul client configuration parameters (if any): There is nothing specific in my consul config

commarla commented 4 years ago

Maybe it comes from this https://github.com/jrasell/sherpa/blob/master/pkg/autoscale/autoscale.go#L76

We may never test the scale out if there is a scale in. I think it is better to test the "out" before the "in" in the switch case to ensure the availability of the service.

jrasell commented 4 years ago

@commarla I think you're exactly right and I had this on my mind a few days ago to look into a figure out if it was a problem. I'll get right on to this; thanks for the detailed report!

The simple solution is to put the scale-out checks ahead of scale-in checks which would catch the situation. I think its important though for operators to understand if jobs have large differences in resource consumptions, so a slightly more complex solution would be warranted.

commarla commented 4 years ago

@jrasell thanks for the quick answer. I have build my own version to fix my use case. I understand you have to think about it to cover other cases. You can close my PR if you want.

jrasell / sherpa

Scale in instead of out #84