hashicorp / nomad-autoscaler

Nomad Autoscaler brings autoscaling to your Nomad workloads.
Mozilla Public License 2.0
424 stars 84 forks source link

Nomad-Autoscaler stops evaluating metrics altogether #410

Open jokramer123 opened 3 years ago

jokramer123 commented 3 years ago

Hello,

We are currently using the autoscaler as a Nomad job that runs inside a namespace and has a scaling policy that references an AWS autoscaling group for a target. The autoscaler job is using an IAM role and retrieves a Nomad ACL token from Vault. Eventually, it seems like the nomad-autoscaler stops communicating with the metric source (Datadog).

nomad enterprise v1.0.2. autoscaler v0.3.0

I notice this happens after the autoscaler has performed a number of scaling activities on the autoscaling group, eventually there are a number of nomad nodes that are ineligible (since their work was done and the autoscaler drain stopped them and terminated them).

In the log the autoscaler seems to be stuck evaluating these nodes and never begins querying for the metric to scale up again:

2021-03-02T18:22:07.270Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:23:02.943Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:23:07.236Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:24:02.943Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:24:07.259Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:25:02.944Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:25:07.262Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:26:02.943Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:26:07.227Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:27:02.943Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:27:07.236Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:28:02.944Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:28:07.243Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:29:02.944Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:29:07.267Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:30:02.943Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:30:07.274Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:31:02.944Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:31:07.257Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:32:02.943Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:33:02.945Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection

Over the last 30 minutes the autoscaler has effectively failed to scale up even though their are valid jobs queued and not able to be allocated: image

Even if I stopped the autoscaler and respawn it, it re-enters this state and cannot conduct scaling activities.

Any help would be appreciated.

Here is my nomad job which runs the autoscaler:

job "${asg_name}-autoscaler" {

  constraint {
    attribute = "$${node.class}"
    value = "${namespace}"
  }

  namespace = "${namespace}"
  datacenters = ["redacted"]

  group "autoscaler" {
    count = 1

    task "autoscaler" {
      template {
      data = <<EOH
      NOMAD_TOKEN="{{with secret "nomad/creds/${namespace}-autoscaler-${asg_name}"}}{{.Data.secret_id}}{{end}}"
      EOH

      destination = "secrets/file.env"
      env         = true
      }

      vault {
        policies = [
          "${namespace}-nomad-client-policy"
        ]
      }

      driver = "docker"

      config {
        image   = "hashicorp/nomad-autoscaler:0.2.1"
        command = "nomad-autoscaler"
        args    = [
          "agent",
          "-config",
          "$${NOMAD_TASK_DIR}/config.hcl",
          "-log-level",
          "debug"
        ]
        port_map {
          http = 8080
        }
      }

# ---------------------------------------------------------------------------------------------------------------------
# Autoscaler config
# This template stanza is for the autoscaler config
# ---------------------------------------------------------------------------------------------------------------------

      template {
        data = <<EOF

policy {
  dir = "local/policies/"
  default_cooldown = "2m"
}

plugin_dir = "/plugins"

nomad {
  address = "$NOMAD_ADDR"
}
apm "nomad" {
  driver = "nomad-apm"
  config  = {
    address = "$NOMAD_ADDR"
  }
}

target "aws-asg" {
  driver = "aws-asg"
  config = {
    aws_region = "${region}"
  }
}

apm "datadog"{
   driver = "datadog"
   config = {
 {{ with secret "kv/data/shared"}}
     dd_api_key = "{{.Data.data.DATADOG_API_KEY}}"
     dd_app_key = "{{.Data.data.DATADOG_APP_KEY}}"
 {{ end }}
   }
}

strategy "target-value" {
  driver = "target-value"
}
          EOF

        destination = "local/config.hcl"
      }

# ---------------------------------------------------------------------------------------------------------------------
# Autoscaler policy
# ---------------------------------------------------------------------------------------------------------------------

template {
        data = <<EOF

scaling "cluster_policy" {
  enabled = true
  min     = 0
  max     = 5

  policy {
    cooldown            = "4m"
    evaluation_interval = "1m"

    check "batch_job_queue_size" {
      source = "datadog"
      query = "sum:nomad.nomad.job_summary.queued{namespace:${namespace}} by {namespace}"
      query_window = "4m"
      strategy "target-value" {
        target = 0
      }
    }

    target "aws-asg" {
      dry-run             = "false"
      aws_asg_name        = "${asg_name}"
      node_class          = "${namespace}"
      node_purge          = "true"
    }
  }
}
          EOF
        destination = "local/policies/batch.hcl"
      }

    resources {
      cpu    = 300
      memory = 512

      network {
        mbits = 10
        port "http" {}
      }
    }

    service {
      name = "autoscaler"
      port = "http"

      check {
        type     = "http"
        path     = "/v1/health"
        interval = "5s"
        timeout  = "4s"
        }
      }
    }
  }
}

Here is a full log of the autoscaler just getting stuck and not doing anything:

2021-03-02T16:57:04.234Z [INFO]  agent: Starting Nomad Autoscaler agent
2021-03-02T16:57:04.235Z [INFO]  agent: Nomad Autoscaler agent configuration:
2021-03-02T16:57:04.235Z [INFO]  agent:
2021-03-02T16:57:04.235Z [INFO]  agent:         Bind Addrs: 127.0.0.1
2021-03-02T16:57:04.235Z [INFO]  agent:          Log Level: debug
2021-03-02T16:57:04.235Z [INFO]  agent:            Plugins: /plugins
2021-03-02T16:57:04.235Z [INFO]  agent:           Policies: local/policies/
2021-03-02T16:57:04.235Z [INFO]  agent:            Version: v0.3.0 (04986be)
2021-03-02T16:57:04.235Z [INFO]  agent:
2021-03-02T16:57:04.235Z [INFO]  agent: Nomad Autoscaler agent started! Log data will stream in below:
2021-03-02T16:57:04.236Z [INFO]  agent.http_server: server now listening for connections: address=127.0.0.1:8080
2021-03-02T16:57:08.160Z [INFO]  agent.plugin_manager: successfully launched and dispensed plugin: plugin_name=aws-asg
2021-03-02T16:57:08.163Z [INFO]  agent.plugin_manager: successfully launched and dispensed plugin: plugin_name=nomad-apm
2021-03-02T16:57:08.163Z [INFO]  agent.plugin_manager: successfully launched and dispensed plugin: plugin_name=nomad
2021-03-02T16:57:08.164Z [INFO]  agent.plugin_manager: successfully launched and dispensed plugin: plugin_name=datadog
2021-03-02T16:57:08.165Z [INFO]  agent.plugin_manager: successfully launched and dispensed plugin: plugin_name=target-value
2021-03-02T16:57:08.165Z [INFO]  agent.plugin_manager: successfully launched and dispensed plugin: plugin_name=nomad-target
2021-03-02T16:57:08.166Z [DEBUG] file_policy_source: starting file policy source ID monitor
2021-03-02T16:57:08.169Z [DEBUG] file_policy_source: starting file policy monitor: file=local/policies/batch.hcl name=cluster_policy policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T16:57:08.172Z [INFO]  policy_eval.worker: starting worker: id=8869ecd1-ec83-3a36-a86b-e49bc6ae86d9 queue=horizontal
2021-03-02T16:57:08.172Z [DEBUG] policy_eval.broker: dequeue eval: queue=horizontal
2021-03-02T16:57:08.184Z [DEBUG] policy_eval.broker: waiting for eval: queue=horizontal
2021-03-02T16:57:08.184Z [INFO]  policy_eval.worker: starting worker: id=5fbbb0f1-3e25-7824-2c25-8faa39febbf5 queue=horizontal
2021-03-02T16:57:08.185Z [DEBUG] policy_eval.broker: dequeue eval: queue=horizontal
2021-03-02T16:57:08.185Z [DEBUG] policy_eval.broker: waiting for eval: queue=horizontal
2021-03-02T16:57:08.186Z [INFO]  policy_eval.worker: starting worker: id=0713b1d2-4c51-68d7-f2da-a683a2d3c5e8 queue=horizontal
2021-03-02T16:57:08.186Z [DEBUG] policy_eval.broker: dequeue eval: queue=horizontal
2021-03-02T16:57:08.187Z [DEBUG] policy_eval.broker: waiting for eval: queue=horizontal
2021-03-02T16:57:08.187Z [INFO]  policy_eval.worker: starting worker: id=d9532c53-6cd8-4d8f-c746-d8a25d2bf587 queue=horizontal
2021-03-02T16:57:08.187Z [DEBUG] policy_eval.broker: dequeue eval: queue=horizontal
2021-03-02T16:57:08.187Z [DEBUG] policy_eval.broker: waiting for eval: queue=horizontal
2021-03-02T16:57:08.187Z [INFO]  policy_eval.worker: starting worker: id=a9d42538-3f46-0b12-f47d-920641f1074a queue=horizontal
2021-03-02T16:57:08.188Z [DEBUG] policy_eval.broker: dequeue eval: queue=horizontal
2021-03-02T16:57:08.188Z [DEBUG] policy_eval.broker: waiting for eval: queue=horizontal
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=e3055e1f-0a0a-895f-20c1-ee46c7ec41a6 queue=horizontal
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=horizontal
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=horizontal
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=d1c3a094-0c55-3a08-8be1-028422cabc39 queue=horizontal
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=horizontal
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=horizontal
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=79f03f60-2210-e325-873b-20a747b079b6 queue=horizontal
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=horizontal
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=horizontal
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=71bca267-2b76-6fdf-d002-fae36294fb7d queue=horizontal
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=horizontal
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=horizontal
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=39693838-c21a-5e65-dccb-95e172943887 queue=horizontal
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=horizontal
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=horizontal
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=98fc2b73-a33e-9e36-1439-f24a5439b549 queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=5509cdb3-69bd-5896-d276-f9701e1f7e3c queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=615bf174-763b-bacc-c753-53053486af51 queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=ad570f97-69dc-17aa-3451-8fa430b5821a queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=bfa47795-94d6-8dcc-fb22-fe975f45c520 queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=ae285192-b129-4bca-2342-3d459ffab003 queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T16:57:08.189Z [INFO]  policy_eval.worker: starting worker: id=cffa85b6-c129-d97e-d228-9098b4f11bd7 queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T16:57:08.189Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T16:57:08.190Z [INFO]  policy_eval.worker: starting worker: id=0e6feab4-992c-29ad-c242-b47b38e511a3 queue=cluster
2021-03-02T16:57:08.190Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T16:57:08.192Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T16:57:08.192Z [DEBUG] nomad_policy_source: starting policy blocking query watcher
2021-03-02T16:58:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T16:58:12.363Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=f8c0e7a7-687d-12da-84cd-1b315b3bfdea policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T16:58:12.363Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=f8c0e7a7-687d-12da-84cd-1b315b3bfdea policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T16:58:12.363Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=f8c0e7a7-687d-12da-84cd-1b315b3bfdea policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=46e337a0-a2bb-9c32-f069-9fa2372a713b
2021-03-02T16:58:12.367Z [DEBUG] policy_eval.worker: received policy for evaluation: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T16:58:12.367Z [DEBUG] policy_eval.worker: fetching current count: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T16:58:12.949Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T16:58:12.949Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T16:58:13.365Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T16:58:13.365Z [INFO]  policy_eval.worker: scaling target: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg from=0 to=1 reason="scaling up because factor is 1.000000" meta=map[nomad_policy_id:cabe198b-80a1-4fbc-0432-5c39f44c18b1]
2021-03-02T16:58:33.890Z [INFO]  internal_plugin.aws-asg: successfully performed and verified scaling out: action=scale_out asg_name=nomadprod-batch-processing-asg desired_count=1
2021-03-02T16:58:33.890Z [INFO]  policy_eval.worker: successfully submitted scaling action to target: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg desired_count=1
2021-03-02T16:58:33.890Z [INFO]  policy_eval.worker: policy evaluation complete: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T16:58:33.891Z [DEBUG] policy_manager.policy_handler: scaling policy has been placed into cooldown: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 cooldown=4m0s
2021-03-02T16:58:33.891Z [DEBUG] policy_eval.broker: ack eval: eval_id=f8c0e7a7-687d-12da-84cd-1b315b3bfdea token=46e337a0-a2bb-9c32-f069-9fa2372a713b eval_id=f8c0e7a7-687d-12da-84cd-1b315b3bfdea token=46e337a0-a2bb-9c32-f069-9fa2372a713b
2021-03-02T16:58:33.891Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T16:58:33.891Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T16:58:33.891Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T16:59:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:00:08.172Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:01:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:02:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:02:34.532Z [DEBUG] policy_manager.policy_handler: scaling policy has been placed into cooldown: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 cooldown=25.107960069s
2021-03-02T17:03:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:03:08.754Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=03e0b84b-aab7-c229-1aa3-aa117e5fceda policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:03:08.754Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=03e0b84b-aab7-c229-1aa3-aa117e5fceda policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:03:08.755Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=03e0b84b-aab7-c229-1aa3-aa117e5fceda policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=43a70332-3aec-16be-a9ba-2b6f68154059
2021-03-02T17:03:08.755Z [DEBUG] policy_eval.worker: received policy for evaluation: id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:03:08.755Z [DEBUG] policy_eval.worker: fetching current count: id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:03:09.335Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:03:09.336Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:03:09.755Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=1
2021-03-02T17:03:09.756Z [INFO]  policy_eval.worker: scaling target: id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg from=1 to=0 reason="scaling down because factor is 0.000000" meta=map[nomad_policy_id:cabe198b-80a1-4fbc-0432-5c39f44c18b1]
2021-03-02T17:03:10.104Z [DEBUG] internal_plugin.aws-asg: identified Nomad node for removal: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751
2021-03-02T17:03:10.436Z [DEBUG] internal_plugin.aws-asg: identified remote provider ID for node: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 remote_id=i-0e9c84e9ce071881b
2021-03-02T17:03:10.437Z [INFO]  internal_plugin.aws-asg: triggering drain on node: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 deadline=1h0m0s
2021-03-02T17:03:37.019Z [DEBUG] internal_plugin.aws-asg: received node drain message: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 msg="Alloc "d51f9064-dae6-ce2a-b03d-5f95b0936d6f" status pending -> running"
2021-03-02T17:04:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:04:08.474Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:05:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:05:08.454Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:06:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:06:08.451Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:07:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:07:08.493Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:08:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:08:08.465Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:08:08.755Z [DEBUG] policy_eval.broker: nack eval: eval_id=03e0b84b-aab7-c229-1aa3-aa117e5fceda token=43a70332-3aec-16be-a9ba-2b6f68154059
2021-03-02T17:08:08.756Z [WARN]  policy_eval.broker: eval delivery limit reached: eval_id=03e0b84b-aab7-c229-1aa3-aa117e5fceda policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=43a70332-3aec-16be-a9ba-2b6f68154059 count=1 limit=1
2021-03-02T17:09:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:09:08.490Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:10:08.170Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:10:08.459Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:11:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:11:08.449Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:12:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:12:08.483Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:13:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:13:08.455Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:14:08.168Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:14:08.459Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:15:08.172Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:15:08.500Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:16:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:16:08.461Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:17:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:17:08.485Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:18:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:18:08.516Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:19:08.168Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:19:08.456Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:20:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:20:08.457Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:21:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:21:08.449Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:22:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:22:08.477Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:23:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:23:08.452Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:24:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:24:08.450Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:25:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:25:08.462Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:26:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:26:08.451Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:27:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:27:08.449Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:28:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:28:08.460Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:29:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:29:08.458Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:30:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:30:08.506Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:31:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:31:08.485Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:32:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:32:08.480Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:33:08.167Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:33:08.477Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:34:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:34:08.462Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 is draining"
2021-03-02T17:34:08.561Z [DEBUG] internal_plugin.aws-asg: received node drain message: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 msg="Alloc "d51f9064-dae6-ce2a-b03d-5f95b0936d6f" status running -> complete"
2021-03-02T17:34:08.587Z [INFO]  internal_plugin.aws-asg: received node drain message: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 msg="Drain complete for node 6f3b3eea-5e58-a7ca-cb7d-363f56f0e751"
2021-03-02T17:34:08.843Z [DEBUG] internal_plugin.aws-asg: received node drain message: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 msg="Alloc "79349114-9ea9-909d-c931-97371b231cdc" marked for migration"
2021-03-02T17:34:08.843Z [DEBUG] internal_plugin.aws-asg: received node drain message: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 msg="Alloc "718e5b2d-6a33-a5bc-9b81-0c664ad94327" marked for migration"
2021-03-02T17:34:09.721Z [DEBUG] internal_plugin.aws-asg: received node drain message: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 msg="Alloc "79349114-9ea9-909d-c931-97371b231cdc" status running -> complete"
2021-03-02T17:34:09.721Z [DEBUG] internal_plugin.aws-asg: received node drain message: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 msg="Alloc "718e5b2d-6a33-a5bc-9b81-0c664ad94327" status running -> complete"
2021-03-02T17:34:09.721Z [INFO]  internal_plugin.aws-asg: received node drain message: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751 msg="All allocations on node "6f3b3eea-5e58-a7ca-cb7d-363f56f0e751" have stopped"
2021-03-02T17:34:09.721Z [INFO]  internal_plugin.aws-asg: node drain complete: node_id=6f3b3eea-5e58-a7ca-cb7d-363f56f0e751
2021-03-02T17:34:09.721Z [INFO]  internal_plugin.aws-asg: pre scale-in tasks now complete
2021-03-02T17:34:09.953Z [DEBUG] internal_plugin.aws-asg: detaching instances from AutoScaling Group: action=scale_in asg_name=nomadprod-batch-processing-asg instances=[i-0e9c84e9ce071881b]
2021-03-02T17:34:20.427Z [INFO]  internal_plugin.aws-asg: successfully detached instances from AutoScaling Group: action=scale_in asg_name=nomadprod-batch-processing-asg instances=[i-0e9c84e9ce071881b]
2021-03-02T17:34:20.585Z [DEBUG] internal_plugin.aws-asg: terminating EC2 instances: action=scale_in asg_name=nomadprod-batch-processing-asg instances=[i-0e9c84e9ce071881b]
2021-03-02T17:35:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:35:08.896Z [DEBUG] policy_manager.policy_handler: scaling policy has been placed into cooldown: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 cooldown=3m2.826607101s
2021-03-02T17:36:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:36:41.420Z [ERROR] policy_eval.worker: failed to evaluate policy: eval_id=03e0b84b-aab7-c229-1aa3-aa117e5fceda eval_token=43a70332-3aec-16be-a9ba-2b6f68154059 id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster err="failed to scale target: failed to perform scaling action: failed to scale in AWS AutoScaling Group: failed to terminate EC2 instances: reached retry limit"
2021-03-02T17:36:41.420Z [DEBUG] policy_eval.broker: nack eval: eval_id=03e0b84b-aab7-c229-1aa3-aa117e5fceda token=43a70332-3aec-16be-a9ba-2b6f68154059
2021-03-02T17:36:41.421Z [WARN]  policy_eval.worker: failed to NACK policy evaluation: eval_id=03e0b84b-aab7-c229-1aa3-aa117e5fceda eval_token=43a70332-3aec-16be-a9ba-2b6f68154059 id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster err="evaluation ID not found"
2021-03-02T17:36:41.421Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:36:41.421Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:37:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:38:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:38:12.459Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=1a3c09fc-a5cf-a5a6-58f5-5f87394749ec policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:38:12.459Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=1a3c09fc-a5cf-a5a6-58f5-5f87394749ec policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:38:12.459Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=1a3c09fc-a5cf-a5a6-58f5-5f87394749ec policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=dcdaf15b-b5c5-b4f2-d42a-3d51fe640ec8
2021-03-02T17:38:12.461Z [DEBUG] policy_eval.worker: received policy for evaluation: id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:38:12.461Z [DEBUG] policy_eval.worker: fetching current count: id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:38:13.099Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:38:13.100Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:38:13.530Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:38:13.530Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:38:13.530Z [DEBUG] policy_eval.worker: no checks need to be executed: id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:38:13.530Z [DEBUG] policy_eval.broker: ack eval: eval_id=1a3c09fc-a5cf-a5a6-58f5-5f87394749ec token=dcdaf15b-b5c5-b4f2-d42a-3d51fe640ec8 eval_id=1a3c09fc-a5cf-a5a6-58f5-5f87394749ec token=dcdaf15b-b5c5-b4f2-d42a-3d51fe640ec8
2021-03-02T17:38:13.530Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:38:13.530Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:38:13.530Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:39:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:39:08.837Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=f8bf0fc0-a372-2c4b-01f7-f9740bbd5bbb policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:39:08.837Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=f8bf0fc0-a372-2c4b-01f7-f9740bbd5bbb policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:39:08.837Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=f8bf0fc0-a372-2c4b-01f7-f9740bbd5bbb policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=eb755f76-97b8-b2b4-b1bb-bfc2199690a2
2021-03-02T17:39:08.837Z [DEBUG] policy_eval.worker: received policy for evaluation: id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:39:08.838Z [DEBUG] policy_eval.worker: fetching current count: id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:39:09.429Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:39:09.429Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:39:09.603Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:39:09.603Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:39:09.603Z [DEBUG] policy_eval.worker: no checks need to be executed: id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:39:09.603Z [DEBUG] policy_eval.broker: ack eval: eval_id=f8bf0fc0-a372-2c4b-01f7-f9740bbd5bbb token=eb755f76-97b8-b2b4-b1bb-bfc2199690a2 eval_id=f8bf0fc0-a372-2c4b-01f7-f9740bbd5bbb token=eb755f76-97b8-b2b4-b1bb-bfc2199690a2
2021-03-02T17:39:09.603Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:39:09.603Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:39:09.603Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:40:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:40:08.811Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=00540f4d-9210-30d9-8fc6-77d62407ef3f policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:40:08.812Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=00540f4d-9210-30d9-8fc6-77d62407ef3f policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:40:08.812Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=00540f4d-9210-30d9-8fc6-77d62407ef3f policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=0db51a49-16f2-97c1-5387-34f8ea9fa84e
2021-03-02T17:40:08.812Z [DEBUG] policy_eval.worker: received policy for evaluation: id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:40:08.812Z [DEBUG] policy_eval.worker: fetching current count: id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:40:09.453Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:40:09.453Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:40:09.625Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:40:09.625Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:40:09.625Z [DEBUG] policy_eval.worker: no checks need to be executed: id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:40:09.625Z [DEBUG] policy_eval.broker: ack eval: eval_id=00540f4d-9210-30d9-8fc6-77d62407ef3f token=0db51a49-16f2-97c1-5387-34f8ea9fa84e eval_id=00540f4d-9210-30d9-8fc6-77d62407ef3f token=0db51a49-16f2-97c1-5387-34f8ea9fa84e
2021-03-02T17:40:09.625Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:40:09.625Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:40:09.625Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:41:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:41:08.811Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=8089e254-c5d1-ac80-6c0d-dcbf66336838 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:41:08.811Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=8089e254-c5d1-ac80-6c0d-dcbf66336838 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:41:08.811Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=8089e254-c5d1-ac80-6c0d-dcbf66336838 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=6ff4fa26-5ee9-e9d9-bc58-ce4c0446d0e4
2021-03-02T17:41:08.811Z [DEBUG] policy_eval.worker: received policy for evaluation: id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:41:08.811Z [DEBUG] policy_eval.worker: fetching current count: id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:41:09.421Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:41:09.421Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:41:09.599Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:41:09.599Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:41:09.599Z [DEBUG] policy_eval.worker: no checks need to be executed: id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:41:09.599Z [DEBUG] policy_eval.broker: ack eval: eval_id=8089e254-c5d1-ac80-6c0d-dcbf66336838 token=6ff4fa26-5ee9-e9d9-bc58-ce4c0446d0e4 eval_id=8089e254-c5d1-ac80-6c0d-dcbf66336838 token=6ff4fa26-5ee9-e9d9-bc58-ce4c0446d0e4
2021-03-02T17:41:09.599Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:41:09.599Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:41:09.599Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:42:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:42:08.721Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=de049b3a-6bcf-896b-c9b0-c1ffb59d5feb policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:42:08.721Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=de049b3a-6bcf-896b-c9b0-c1ffb59d5feb policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:42:08.721Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=de049b3a-6bcf-896b-c9b0-c1ffb59d5feb policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=36a99767-774e-f8ce-ee1e-6ab15184aee3
2021-03-02T17:42:08.721Z [DEBUG] policy_eval.worker: received policy for evaluation: id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:42:08.721Z [DEBUG] policy_eval.worker: fetching current count: id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:42:09.203Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:42:09.203Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:42:10.429Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:42:10.429Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:42:10.429Z [DEBUG] policy_eval.worker: no checks need to be executed: id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:42:10.429Z [DEBUG] policy_eval.broker: ack eval: eval_id=de049b3a-6bcf-896b-c9b0-c1ffb59d5feb token=36a99767-774e-f8ce-ee1e-6ab15184aee3 eval_id=de049b3a-6bcf-896b-c9b0-c1ffb59d5feb token=36a99767-774e-f8ce-ee1e-6ab15184aee3
2021-03-02T17:42:10.429Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:42:10.429Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:42:10.429Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:43:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:43:08.798Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=bcd2c52c-0920-a136-4b36-b7402f6699a7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:43:08.798Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=bcd2c52c-0920-a136-4b36-b7402f6699a7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:43:08.798Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=bcd2c52c-0920-a136-4b36-b7402f6699a7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=6222ac2a-33eb-5963-76d6-35b7546960bc
2021-03-02T17:43:08.800Z [DEBUG] policy_eval.worker: received policy for evaluation: id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:43:08.801Z [DEBUG] policy_eval.worker: fetching current count: id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:43:09.362Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:43:09.362Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:43:09.541Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:43:09.541Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:43:09.541Z [DEBUG] policy_eval.worker: no checks need to be executed: id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:43:09.541Z [DEBUG] policy_eval.broker: ack eval: eval_id=bcd2c52c-0920-a136-4b36-b7402f6699a7 token=6222ac2a-33eb-5963-76d6-35b7546960bc eval_id=bcd2c52c-0920-a136-4b36-b7402f6699a7 token=6222ac2a-33eb-5963-76d6-35b7546960bc
2021-03-02T17:43:09.542Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:43:09.542Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:43:09.542Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:44:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:44:08.764Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=45c3c781-7849-e124-e641-c31e265aa0a8 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:44:08.764Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=45c3c781-7849-e124-e641-c31e265aa0a8 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:44:08.764Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=45c3c781-7849-e124-e641-c31e265aa0a8 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=58edad17-4fe6-92d7-4aec-158bdec5c988
2021-03-02T17:44:08.764Z [DEBUG] policy_eval.worker: received policy for evaluation: id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:44:08.764Z [DEBUG] policy_eval.worker: fetching current count: id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:44:09.334Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:44:09.334Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:44:09.526Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:44:09.526Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:44:09.527Z [DEBUG] policy_eval.worker: no checks need to be executed: id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:44:09.527Z [DEBUG] policy_eval.broker: ack eval: eval_id=45c3c781-7849-e124-e641-c31e265aa0a8 token=58edad17-4fe6-92d7-4aec-158bdec5c988 eval_id=45c3c781-7849-e124-e641-c31e265aa0a8 token=58edad17-4fe6-92d7-4aec-158bdec5c988
2021-03-02T17:44:09.528Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:44:09.529Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:44:09.529Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:45:08.168Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:45:08.857Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=8364a207-e784-1c47-476c-515e94905096 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:45:08.857Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=8364a207-e784-1c47-476c-515e94905096 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:45:08.857Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=8364a207-e784-1c47-476c-515e94905096 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=e2e8ebaa-43f8-4337-c8d0-eba695869676
2021-03-02T17:45:08.857Z [DEBUG] policy_eval.worker: received policy for evaluation: id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:45:08.857Z [DEBUG] policy_eval.worker: fetching current count: id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:45:09.488Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:45:09.488Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:45:09.905Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:45:09.905Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:45:09.905Z [DEBUG] policy_eval.worker: no checks need to be executed: id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:45:09.905Z [DEBUG] policy_eval.broker: ack eval: eval_id=8364a207-e784-1c47-476c-515e94905096 token=e2e8ebaa-43f8-4337-c8d0-eba695869676 eval_id=8364a207-e784-1c47-476c-515e94905096 token=e2e8ebaa-43f8-4337-c8d0-eba695869676
2021-03-02T17:45:09.905Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:45:09.905Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:45:09.905Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:46:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:46:08.768Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=7b0b5e5a-d3aa-06ae-24a2-e6c38808b814 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:46:08.768Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=7b0b5e5a-d3aa-06ae-24a2-e6c38808b814 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:46:08.769Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=7b0b5e5a-d3aa-06ae-24a2-e6c38808b814 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=07d45aa4-cd9d-cd43-cab4-b7e1ce5af153
2021-03-02T17:46:08.770Z [DEBUG] policy_eval.worker: received policy for evaluation: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:46:08.770Z [DEBUG] policy_eval.worker: fetching current count: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:46:09.426Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:46:09.426Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:46:09.611Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:46:09.611Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:46:09.611Z [DEBUG] policy_eval.worker: no checks need to be executed: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:46:09.611Z [DEBUG] policy_eval.broker: ack eval: eval_id=7b0b5e5a-d3aa-06ae-24a2-e6c38808b814 token=07d45aa4-cd9d-cd43-cab4-b7e1ce5af153 eval_id=7b0b5e5a-d3aa-06ae-24a2-e6c38808b814 token=07d45aa4-cd9d-cd43-cab4-b7e1ce5af153
2021-03-02T17:46:09.611Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:46:09.611Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:46:09.611Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:47:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:47:08.753Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=22c12f32-dfc1-3351-e72a-1a82ca753347 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:47:08.753Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=22c12f32-dfc1-3351-e72a-1a82ca753347 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:47:08.753Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=22c12f32-dfc1-3351-e72a-1a82ca753347 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=8a023e51-8645-c2e2-0cad-4acec8db49bb
2021-03-02T17:47:08.753Z [DEBUG] policy_eval.worker: received policy for evaluation: id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:47:08.753Z [DEBUG] policy_eval.worker: fetching current count: id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:47:09.350Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:47:09.350Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:47:09.526Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:47:09.526Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:47:09.526Z [DEBUG] policy_eval.worker: no checks need to be executed: id=98fc2b73-a33e-9e36-1439-f24a5439b549 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:47:09.526Z [DEBUG] policy_eval.broker: ack eval: eval_id=22c12f32-dfc1-3351-e72a-1a82ca753347 token=8a023e51-8645-c2e2-0cad-4acec8db49bb eval_id=22c12f32-dfc1-3351-e72a-1a82ca753347 token=8a023e51-8645-c2e2-0cad-4acec8db49bb
2021-03-02T17:47:09.526Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:47:09.526Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:47:09.526Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:48:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:48:09.094Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=e3d85714-b32c-d041-f011-f79466a6047f policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:48:09.095Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=e3d85714-b32c-d041-f011-f79466a6047f policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:48:09.095Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=e3d85714-b32c-d041-f011-f79466a6047f policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=8af8d3a7-8771-9636-df7a-ce83fb4f69b7
2021-03-02T17:48:09.095Z [DEBUG] policy_eval.worker: received policy for evaluation: id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:48:09.095Z [DEBUG] policy_eval.worker: fetching current count: id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:48:09.619Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:48:09.621Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:48:10.028Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:48:10.029Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:48:10.029Z [DEBUG] policy_eval.worker: no checks need to be executed: id=5509cdb3-69bd-5896-d276-f9701e1f7e3c policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:48:10.029Z [DEBUG] policy_eval.broker: ack eval: eval_id=e3d85714-b32c-d041-f011-f79466a6047f token=8af8d3a7-8771-9636-df7a-ce83fb4f69b7 eval_id=e3d85714-b32c-d041-f011-f79466a6047f token=8af8d3a7-8771-9636-df7a-ce83fb4f69b7
2021-03-02T17:48:10.029Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:48:10.029Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:48:10.029Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:49:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:49:08.781Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=6a00a9f1-0e01-f3d4-2d3c-be1fc4bb2ba0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:49:08.781Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=6a00a9f1-0e01-f3d4-2d3c-be1fc4bb2ba0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:49:08.781Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=6a00a9f1-0e01-f3d4-2d3c-be1fc4bb2ba0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=bab2b4af-a7ba-5692-dbe2-ffa2795b843b
2021-03-02T17:49:08.781Z [DEBUG] policy_eval.worker: received policy for evaluation: id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:49:08.781Z [DEBUG] policy_eval.worker: fetching current count: id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:49:09.356Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:49:09.356Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:49:09.538Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:49:09.538Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:49:09.538Z [DEBUG] policy_eval.worker: no checks need to be executed: id=615bf174-763b-bacc-c753-53053486af51 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:49:09.538Z [DEBUG] policy_eval.broker: ack eval: eval_id=6a00a9f1-0e01-f3d4-2d3c-be1fc4bb2ba0 token=bab2b4af-a7ba-5692-dbe2-ffa2795b843b eval_id=6a00a9f1-0e01-f3d4-2d3c-be1fc4bb2ba0 token=bab2b4af-a7ba-5692-dbe2-ffa2795b843b
2021-03-02T17:49:09.538Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:49:09.538Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:49:09.538Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:50:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:50:08.743Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=eb3d5233-215b-29b5-485c-dbefcd124519 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:50:08.743Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=eb3d5233-215b-29b5-485c-dbefcd124519 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:50:08.743Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=eb3d5233-215b-29b5-485c-dbefcd124519 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=bc2ccca2-dd98-5909-a93e-46b4d8e43a18
2021-03-02T17:50:08.743Z [DEBUG] policy_eval.worker: received policy for evaluation: id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:50:08.743Z [DEBUG] policy_eval.worker: fetching current count: id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:50:09.421Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:50:09.421Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:50:09.619Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:50:09.620Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:50:09.620Z [DEBUG] policy_eval.worker: no checks need to be executed: id=ad570f97-69dc-17aa-3451-8fa430b5821a policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:50:09.621Z [DEBUG] policy_eval.broker: ack eval: eval_id=eb3d5233-215b-29b5-485c-dbefcd124519 token=bc2ccca2-dd98-5909-a93e-46b4d8e43a18 eval_id=eb3d5233-215b-29b5-485c-dbefcd124519 token=bc2ccca2-dd98-5909-a93e-46b4d8e43a18
2021-03-02T17:50:09.621Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:50:09.622Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:50:09.622Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:51:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:51:08.751Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=59343a21-2639-5887-589e-5f7c70c277d2 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:51:08.752Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=59343a21-2639-5887-589e-5f7c70c277d2 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:51:08.752Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=59343a21-2639-5887-589e-5f7c70c277d2 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=d399e8e5-65de-18f4-0a1c-60935fd709f6
2021-03-02T17:51:08.752Z [DEBUG] policy_eval.worker: received policy for evaluation: id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:51:08.752Z [DEBUG] policy_eval.worker: fetching current count: id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:51:09.372Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:51:09.372Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:51:09.699Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:51:09.699Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:51:09.699Z [DEBUG] policy_eval.worker: no checks need to be executed: id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:51:09.700Z [DEBUG] policy_eval.broker: ack eval: eval_id=59343a21-2639-5887-589e-5f7c70c277d2 token=d399e8e5-65de-18f4-0a1c-60935fd709f6 eval_id=59343a21-2639-5887-589e-5f7c70c277d2 token=d399e8e5-65de-18f4-0a1c-60935fd709f6
2021-03-02T17:51:09.700Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:51:09.700Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:51:09.700Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:52:08.168Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:52:08.708Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=7da2f613-305f-3056-ae79-5837bd144e9e policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:52:08.708Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=7da2f613-305f-3056-ae79-5837bd144e9e policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:52:08.708Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=7da2f613-305f-3056-ae79-5837bd144e9e policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=e2d606e8-0623-6a58-b019-8165455c4fab
2021-03-02T17:52:08.708Z [DEBUG] policy_eval.worker: received policy for evaluation: id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:52:08.709Z [DEBUG] policy_eval.worker: fetching current count: id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:52:09.210Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:52:09.210Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:52:09.386Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:52:09.386Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:52:09.386Z [DEBUG] policy_eval.worker: no checks need to be executed: id=bfa47795-94d6-8dcc-fb22-fe975f45c520 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:52:09.386Z [DEBUG] policy_eval.broker: ack eval: eval_id=7da2f613-305f-3056-ae79-5837bd144e9e token=e2d606e8-0623-6a58-b019-8165455c4fab eval_id=7da2f613-305f-3056-ae79-5837bd144e9e token=e2d606e8-0623-6a58-b019-8165455c4fab
2021-03-02T17:52:09.386Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:52:09.386Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:52:09.386Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:53:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:53:08.773Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=8a55b733-41a8-4f87-357f-7c2ff86e8727 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:53:08.773Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=8a55b733-41a8-4f87-357f-7c2ff86e8727 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:53:08.773Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=8a55b733-41a8-4f87-357f-7c2ff86e8727 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=f65ef278-55ab-0994-ee4b-3b1e74222158
2021-03-02T17:53:08.774Z [DEBUG] policy_eval.worker: received policy for evaluation: id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:53:08.775Z [DEBUG] policy_eval.worker: fetching current count: id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:53:09.368Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:53:09.370Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:53:09.778Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:53:09.778Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:53:09.778Z [DEBUG] policy_eval.worker: no checks need to be executed: id=ae285192-b129-4bca-2342-3d459ffab003 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:53:09.778Z [DEBUG] policy_eval.broker: ack eval: eval_id=8a55b733-41a8-4f87-357f-7c2ff86e8727 token=f65ef278-55ab-0994-ee4b-3b1e74222158 eval_id=8a55b733-41a8-4f87-357f-7c2ff86e8727 token=f65ef278-55ab-0994-ee4b-3b1e74222158
2021-03-02T17:53:09.778Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:53:09.778Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:53:09.778Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:54:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:54:08.692Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=2b3014ff-54a3-c687-06ee-280e24371890 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:54:08.692Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=2b3014ff-54a3-c687-06ee-280e24371890 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:54:08.693Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=2b3014ff-54a3-c687-06ee-280e24371890 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=4eb8bdc2-2938-cd21-a2a4-acf44c73e37b
2021-03-02T17:54:08.693Z [DEBUG] policy_eval.worker: received policy for evaluation: id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:54:08.694Z [DEBUG] policy_eval.worker: fetching current count: id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:54:09.188Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:54:09.190Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:54:09.349Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:54:09.349Z [DEBUG] policy_eval.worker.check_handler: nothing to do: check=batch_job_queue_size id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:54:09.349Z [DEBUG] policy_eval.worker: no checks need to be executed: id=cffa85b6-c129-d97e-d228-9098b4f11bd7 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:54:09.349Z [DEBUG] policy_eval.broker: ack eval: eval_id=2b3014ff-54a3-c687-06ee-280e24371890 token=4eb8bdc2-2938-cd21-a2a4-acf44c73e37b eval_id=2b3014ff-54a3-c687-06ee-280e24371890 token=4eb8bdc2-2938-cd21-a2a4-acf44c73e37b
2021-03-02T17:54:09.349Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:54:09.349Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:54:09.349Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:55:08.168Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:55:08.734Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=383294dc-1c7b-fdde-9231-02aab61e6863 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:55:08.734Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=383294dc-1c7b-fdde-9231-02aab61e6863 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T17:55:08.735Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=383294dc-1c7b-fdde-9231-02aab61e6863 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=e4f65964-6e1c-d673-703b-87ac0ec38a18
2021-03-02T17:55:08.735Z [DEBUG] policy_eval.worker: received policy for evaluation: id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:55:08.735Z [DEBUG] policy_eval.worker: fetching current count: id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:55:09.271Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T17:55:09.280Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T17:55:09.750Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=0
2021-03-02T17:55:09.750Z [INFO]  policy_eval.worker: scaling target: id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg from=0 to=1 reason="scaling up because factor is 1.000000" meta=map[nomad_policy_id:cabe198b-80a1-4fbc-0432-5c39f44c18b1]
2021-03-02T17:55:20.162Z [INFO]  internal_plugin.aws-asg: successfully performed and verified scaling out: action=scale_out asg_name=nomadprod-batch-processing-asg desired_count=1
2021-03-02T17:55:20.162Z [INFO]  policy_eval.worker: successfully submitted scaling action to target: id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg desired_count=1
2021-03-02T17:55:20.162Z [INFO]  policy_eval.worker: policy evaluation complete: id=0e6feab4-992c-29ad-c242-b47b38e511a3 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T17:55:20.162Z [DEBUG] policy_eval.broker: ack eval: eval_id=383294dc-1c7b-fdde-9231-02aab61e6863 token=e4f65964-6e1c-d673-703b-87ac0ec38a18 eval_id=383294dc-1c7b-fdde-9231-02aab61e6863 token=e4f65964-6e1c-d673-703b-87ac0ec38a18
2021-03-02T17:55:20.162Z [DEBUG] policy_eval.broker: eval ack'd: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T17:55:20.162Z [DEBUG] policy_eval.broker: dequeue eval: queue=cluster
2021-03-02T17:55:20.162Z [DEBUG] policy_eval.broker: waiting for eval: queue=cluster
2021-03-02T17:55:20.162Z [DEBUG] policy_manager.policy_handler: scaling policy has been placed into cooldown: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 cooldown=4m0s
2021-03-02T17:56:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:57:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:58:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:59:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T17:59:20.724Z [DEBUG] policy_manager.policy_handler: scaling policy has been placed into cooldown: policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 cooldown=28.831116008s
2021-03-02T18:00:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:00:08.730Z [DEBUG] policy_eval.broker: enqueue eval: eval_id=8c262104-49b2-3deb-0aaf-4646412c3212 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T18:00:08.730Z [DEBUG] policy_eval.broker: eval enqueued: eval_id=8c262104-49b2-3deb-0aaf-4646412c3212 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster token=
2021-03-02T18:00:08.730Z [DEBUG] policy_eval.broker: eval dequeued: queue=cluster eval_id=8c262104-49b2-3deb-0aaf-4646412c3212 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=fd3f40ee-8e6c-8369-5c7e-c2f16d8b6320
2021-03-02T18:00:08.730Z [DEBUG] policy_eval.worker: received policy for evaluation: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T18:00:08.730Z [DEBUG] policy_eval.worker: fetching current count: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg
2021-03-02T18:00:09.306Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=batch_job_queue_size id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg
2021-03-02T18:00:09.306Z [DEBUG] policy_eval.worker.check_handler: querying source: check=batch_job_queue_size id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg query="sum:nomad.nomad.job_summary.queued{namespace:business-unit-pcdx-test} by {namespace}" source=datadog
2021-03-02T18:00:09.672Z [DEBUG] policy_eval.worker.check_handler: calculating new count: check=batch_job_queue_size id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster source=datadog strategy=target-value target=aws-asg count=1
2021-03-02T18:00:09.672Z [INFO]  policy_eval.worker: scaling target: id=6e59f0d6-3580-d660-b2e9-f0e925e47d7d policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 queue=cluster target=aws-asg from=1 to=0 reason="scaling down because factor is 0.000000" meta=map[nomad_policy_id:cabe198b-80a1-4fbc-0432-5c39f44c18b1]
2021-03-02T18:00:10.085Z [DEBUG] internal_plugin.aws-asg: identified Nomad node for removal: node_id=e02cca8a-6b21-3b1b-3e40-445f0edb6dd3
2021-03-02T18:00:10.363Z [DEBUG] internal_plugin.aws-asg: identified remote provider ID for node: node_id=e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 remote_id=i-05f31e5baef8cddc5
2021-03-02T18:00:10.363Z [INFO]  internal_plugin.aws-asg: triggering drain on node: node_id=e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 deadline=1h0m0s
2021-03-02T18:01:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:01:08.465Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:02:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:02:08.475Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:03:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:03:08.490Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:04:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:04:08.490Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:05:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:05:08.495Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:05:08.730Z [DEBUG] policy_eval.broker: nack eval: eval_id=8c262104-49b2-3deb-0aaf-4646412c3212 token=fd3f40ee-8e6c-8369-5c7e-c2f16d8b6320
2021-03-02T18:05:08.731Z [WARN]  policy_eval.broker: eval delivery limit reached: eval_id=8c262104-49b2-3deb-0aaf-4646412c3212 policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1 token=fd3f40ee-8e6c-8369-5c7e-c2f16d8b6320 count=1 limit=1
2021-03-02T18:06:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:06:08.514Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:07:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:07:08.518Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:08:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:08:08.483Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:09:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:09:08.516Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:10:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:10:08.482Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:11:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:11:08.507Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:12:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:12:08.490Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is draining"
2021-03-02T18:12:57.118Z [INFO]  internal_plugin.aws-asg: received node drain message: node_id=e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 msg="Drain complete for node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3"
2021-03-02T18:12:57.382Z [DEBUG] internal_plugin.aws-asg: received node drain message: node_id=e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 msg="Alloc "eb9d0ce7-60e8-5d7d-93d3-c2c241addf77" marked for migration"
2021-03-02T18:12:57.382Z [DEBUG] internal_plugin.aws-asg: received node drain message: node_id=e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 msg="Alloc "f533eaa1-7a49-6537-7705-aaf455de883a" marked for migration"
2021-03-02T18:12:58.516Z [DEBUG] internal_plugin.aws-asg: received node drain message: node_id=e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 msg="Alloc "eb9d0ce7-60e8-5d7d-93d3-c2c241addf77" status running -> complete"
2021-03-02T18:12:59.020Z [DEBUG] internal_plugin.aws-asg: received node drain message: node_id=e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 msg="Alloc "f533eaa1-7a49-6537-7705-aaf455de883a" status running -> complete"
2021-03-02T18:13:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:13:08.494Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:14:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:14:08.513Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:15:08.168Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:15:08.525Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:16:08.165Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:16:08.555Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:17:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:17:08.500Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:18:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:18:08.484Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:19:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:19:08.490Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:20:08.166Z [DEBUG] internal_plugin.nomad-target: triggering run of handler garbage collection
2021-03-02T18:20:08.503Z [WARN]  internal_plugin.aws-asg: node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible"
2021-03-02T18:20:54.864Z [INFO]  agent: caught signal: signal=terminated
2021-03-02T18:20:54.864Z [INFO]  agent.plugin_manager: shutting down plugin: plugin_name=aws-asg
2021-03-02T18:20:54.864Z [INFO]  agent.plugin_manager: shutting down plugin: plugin_name=nomad-apm
2021-03-02T18:20:54.864Z [INFO]  agent.plugin_manager: shutting down plugin: plugin_name=nomad
2021-03-02T18:20:54.864Z [INFO]  agent.plugin_manager: shutting down plugin: plugin_name=datadog
2021-03-02T18:20:54.864Z [INFO]  agent.plugin_manager: shutting down plugin: plugin_name=target-value
2021-03-02T18:20:54.864Z [INFO]  agent.plugin_manager: shutting down plugin: plugin_name=nomad-target
2021-03-02T18:20:54.864Z [DEBUG] file_policy_source: stopping file source ID monitor due to context done: file=local/policies/batch.hcl name=cluster_policy policy_id=cabe198b-80a1-4fbc-0432-5c39f44c18b1
2021-03-02T18:20:54.864Z [INFO]  agent: context closed, shutting down eval handler
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=0e6feab4-992c-29ad-c242-b47b38e511a3 queue=cluster
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=cffa85b6-c129-d97e-d228-9098b4f11bd7 queue=cluster
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=ae285192-b129-4bca-2342-3d459ffab003 queue=cluster
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=bfa47795-94d6-8dcc-fb22-fe975f45c520 queue=cluster
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=39b90b2a-01c8-db67-e0a2-a76b33b949f0 queue=cluster
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=ad570f97-69dc-17aa-3451-8fa430b5821a queue=cluster
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=615bf174-763b-bacc-c753-53053486af51 queue=cluster
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=5509cdb3-69bd-5896-d276-f9701e1f7e3c queue=cluster
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=98fc2b73-a33e-9e36-1439-f24a5439b549 queue=cluster
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=39693838-c21a-5e65-dccb-95e172943887 queue=horizontal
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=71bca267-2b76-6fdf-d002-fae36294fb7d queue=horizontal
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=79f03f60-2210-e325-873b-20a747b079b6 queue=horizontal
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=d1c3a094-0c55-3a08-8be1-028422cabc39 queue=horizontal
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=e3055e1f-0a0a-895f-20c1-ee46c7ec41a6 queue=horizontal
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=a9d42538-3f46-0b12-f47d-920641f1074a queue=horizontal
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=d9532c53-6cd8-4d8f-c746-d8a25d2bf587 queue=horizontal
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=0713b1d2-4c51-68d7-f2da-a683a2d3c5e8 queue=horizontal
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=5fbbb0f1-3e25-7824-2c25-8faa39febbf5 queue=horizontal
2021-03-02T18:20:54.864Z [INFO]  policy_eval.worker: stopping worker: id=8869ecd1-ec83-3a36-a86b-e49bc6ae86d9 queue=horizontal

Thank you

cgbaker commented 3 years ago

thanks, @jokramer123 , we're looking at this and we'll get back to you.

lgfa29 commented 3 years ago

Hi @jokramer123,

Thanks for the detailed report, and I sorry to see the Autoscaler get blocked. It seems like it restarted during a cluster scale in event and left some nodes marked as ineligible without removing them from the ASG.

If there are nodes being modified (e.g. being drained or ineligible for scheduling), the Autoscaler will not modify the cluster to avoid conflicting with other changes that could be undergoing (like some other external automation, or manual maintenance etc). This can be observed in the log message that you highlighted:

node pool status readiness check failed: error="node e02cca8a-6b21-3b1b-3e40-445f0edb6dd3 is ineligible" 

https://github.com/hashicorp/nomad-autoscaler/issues/191 is the item we're using to track work for reconciling scaling actions after the Autoscaler is restarted.

To get you unblocked I think you can do these steps:

  1. Check your client list in make sure they are all marked as eligible and not draining (stop any drain and mark any ineligible node as eligible).
  2. Try to avoid restarting the Autoscaler. From your jobspec I see that you have some template and vault blocks. I would recommend setting their change_mode [0][1] to signal and their change_signal [0][1] to SIGHUP, this way the Autoscaler will reload its configuration when their value change.

See if these changes help you.

jokramer123 commented 3 years ago

Hello @lgfa29 and @cgbaker,

Thank you for this information. #191 is open for sometime now, definitely this is a must have.

Sometime after the autoscaler respawned it performed a scale-in action against the target autoscaling group which had an active nomad allocation. This results in the allocation becoming 'lost'. I just want to point this out in relation to #191 since this can be pretty disruptive.

Unfortunately the logs I had are gone but I managed to grab a few screenshots:

Screen Shot 2021-03-02 at 12 57 00 PM

Screen Shot 2021-03-02 at 12 51 30 PM

Screen Shot 2021-03-02 at 5 37 44 PM
lgfa29 commented 3 years ago

Thanks for the extra info!

Thank you for this information. #191 is open for sometime now, definitely this is a must have.

Yes, #191 is a bit old and stale because it's something we flagged early as a TODO. It also requires work in Nomad itself that is currently underway.

Sometime after the autoscaler respawned it performed a scale-in action against the target autoscaling group which had an active nomad allocation. This results in the allocation becoming 'lost'. I just want to point this out in relation to #191 since this can be pretty disruptive.

If I understand what you described correctly, the allocation being lost is expected. The Autoscaler will pick a random client to be removed and will then drain it and deregister it from the ASG. We are thinking of ways to allow for better control on which clients get removed on scale in, but in general it's expected that some allocations might have to be reschduled if cluster scaling is being used.

We should make this more clear in our documentation.

jokramer123 commented 3 years ago

To clarify, those allocations are 'lost' because the node was terminated before the drainstop (1 hr) could be completed. To me this is unexpected.

lgfa29 commented 3 years ago

Thanks for the clarification, sorry for missing this point in your last message.

The 1h force drain is a Nomad default, but for the Autoscaler it's set to 15min. You can increase and customize this value per policy.

We will clarify this distinction in our docs as it's can be surprising and unexpected.