jrasell / sherpa

Sherpa is a highly available, fast, and flexible horizontal job scaling for HashiCorp Nomad. It is capable of running in a number of different modes to suit different requirements, and can scale based on Nomad resource metrics or external sources.
Mozilla Public License 2.0
163 stars 8 forks source link

External scaling runtime error when group name differs from job name #127

Closed hobochili closed 4 years ago

hobochili commented 4 years ago

Describe the bug Sherpa autoscale panics upon job evaluation when external scaling is configured for a group with a different name than the job.

To reproduce Run a job with a group named differently than the job itself, such as:

job "foo" {
  region = "global"

  datacenters = ["dc1"]

  type = "service"

  group "bar" {
    count = 1

    restart {
      attempts = 2
      interval = "30m"
      delay = "15s"
      mode = "fail"
    }

    task "tail" {
      driver = "raw_exec"

      config {
        command = "/usr/bin/tail"
        args = ["-f", "/dev/null"]
      }

      resources {
        cpu    = 100
        memory = 64
      }
    }
  }
}

Create a scaling policy for the group with external checks, such as:

{
  "Enabled": true,
  "MaxCount": 16,
  "MinCount": 1,
  "ScaleOutCount": 1,
  "ScaleInCount": 1,
  "ExternalChecks": {
    "prometheus_memory_in": {
      "Enabled": true,
      "Provider": "prometheus",
      "Query": "sum(nomad_client_allocs_memory_usage{task_group='cache'})/sum(nomad_client_allocs_memory_allocated{task_group='cache'})*100",
      "ComparisonOperator": "less-than",
      "ComparisonValue": 30,
      "Action": "scale-in"
    },
    "prometheus_memory_out": {
      "Enabled": true,
      "Provider": "prometheus",
      "Query": "sum(nomad_client_allocs_memory_usage{task_group='cache'})/sum(nomad_client_allocs_memory_allocated{task_group='cache'})*100",
      "ComparisonOperator": "greater-than",
      "ComparisonValue": 80,
      "Action": "scale-out"
    }
  }
}

Expected behavior No runtime error

Environment

$ sherpa system info
Nomad Address                http://127.0.0.1:4646
Policy Engine                Sherpa API
Storage Backend              Consul
Internal AutoScaling Engine  true
Strict Policy Checking       true

Additional context Pardon the json logs

{
     "time":"2020-01-09T16:22:26.788224800Z",
     "message":"worker with func exits from a panic: runtime error: invalid memory address or nil pointer dereference"
}
{
     "time":"2020-01-09T16:22:26.788402500Z",
     "message":"worker with func exits from panic: goroutine 72 [running]:
github.com/jrasell/sherpa/vendor/github.com/panjf2000/ants/v2.(*goWorkerWithFunc).run.func1.1(0xc0000ff230)
    /home/travis/gopath/src/github.com/jrasell/sherpa/vendor/github.com/panjf2000/ants/v2/worker_func.go:58 +0x123
panic(0xa5ec80, 0x10b0ff0)
    /home/travis/.gimme/versions/go1.12.linux.amd64/src/runtime/panic.go:522 +0x1b5
github.com/jrasell/sherpa/pkg/autoscale.(*autoscaleEvaluation).choseCorrectDecision(0xc000181680, 0xc0002eff00, 0x6, 0xc000283838, 0xc00011c568)
    /home/travis/gopath/src/github.com/jrasell/sherpa/pkg/autoscale/decision.go:146 +0x232
github.com/jrasell/sherpa/pkg/autoscale.(*autoscaleEvaluation).calculateExternalScalingDecision(0xc000181680, 0xc0002eff00, 0x6, 0xc0001c1f20, 0xa)
    /home/travis/gopath/src/github.com/jrasell/sherpa/pkg/autoscale/decision.go:90 +0x344
github.com/jrasell/sherpa/pkg/autoscale.(*autoscaleEvaluation).evaluateJob(0xc000181680)
    /home/travis/gopath/src/github.com/jrasell/sherpa/pkg/autoscale/autoscale.go:88 +0x603
github.com/jrasell/sherpa/pkg/autoscale.(*AutoScale).workerPoolFunc.func1(0x9f6260, 0xc0000ff200)
    /home/travis/gopath/src/github.com/jrasell/sherpa/pkg/autoscale/handler.go:258 +0x2c3
github.com/jrasell/sherpa/vendor/github.com/panjf2000/ants/v2.(*goWorkerWithFunc).run.func1(0xc0000ff230)
    /home/travis/gopath/src/github.com/jrasell/sherpa/vendor/github.com/panjf2000/ants/v2/worker_func.go:69 +0xb3
created by github.com/jrasell/sherpa/vendor/github.com/panjf2000/ants/v2.(*goWorkerWithFunc).run
    /home/travis/gopath/src/github.com/jrasell/sherpa/vendor/github.com/panjf2000/ants/v2/worker_func.go:49 +0x4d
"}