Large number of warning logs with nomad service provider

chenjpu commented 1 year ago

Nomad version

1.4.4

Issue

logs( journalctl -fu nomad )


Feb 22 11:19:06 ai-149 nomad[10066]: 2023-02-22T11:19:06.652+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:07 ai-149 nomad[10066]: 2023-02-22T11:19:07.653+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:08 ai-149 nomad[10066]: 2023-02-22T11:19:08.653+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:09 ai-149 nomad[10066]: 2023-02-22T11:19:09.654+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:10 ai-149 nomad[10066]: 2023-02-22T11:19:10.654+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:11 ai-149 nomad[10066]: 2023-02-22T11:19:11.654+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:12 ai-149 nomad[10066]: 2023-02-22T11:19:12.655+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:13 ai-149 nomad[10066]: 2023-02-22T11:19:13.655+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:14 ai-149 nomad[10066]: 2023-02-22T11:19:14.656+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:15 ai-149 nomad[10066]: 2023-02-22T11:19:15.656+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:16 ai-149 nomad[10066]: 2023-02-22T11:19:16.657+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:17 ai-149 nomad[10066]: 2023-02-22T11:19:17.657+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:18 ai-149 nomad[10066]: 2023-02-22T11:19:18.657+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:19 ai-149 nomad[10066]: 2023-02-22T11:19:19.658+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:20 ai-149 nomad[10066]: 2023-02-22T11:19:20.658+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:21 ai-149 nomad[10066]: 2023-02-22T11:19:21.659+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:22 ai-149 nomad[10066]: 2023-02-22T11:19:22.659+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:23 ai-149 nomad[10066]: 2023-02-22T11:19:23.660+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:24 ai-149 nomad[10066]: 2023-02-22T11:19:24.660+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:25 ai-149 nomad[10066]: 2023-02-22T11:19:25.661+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:26 ai-149 nomad[10066]: 2023-02-22T11:19:26.661+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:27 ai-149 nomad[10066]: 2023-02-22T11:19:27.662+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:28 ai-149 nomad[10066]: 2023-02-22T11:19:28.662+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:29 ai-149 nomad[10066]: 2023-02-22T11:19:29.663+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:30 ai-149 nomad[10066]: 2023-02-22T11:19:30.663+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:31 ai-149 nomad[10066]: 2023-02-22T11:19:31.664+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:32 ai-149 nomad[10066]: 2023-02-22T11:19:32.664+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:33 ai-149 nomad[10066]: 2023-02-22T11:19:33.665+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:34 ai-149 nomad[10066]: 2023-02-22T11:19:34.665+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:35 ai-149 nomad[10066]: 2023-02-22T11:19:35.665+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:36 ai-149 nomad[10066]: 2023-02-22T11:19:36.666+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:37 ai-149 nomad[10066]: 2023-02-22T11:19:37.667+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:38 ai-149 nomad[10066]: 2023-02-22T11:19:38.667+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:39 ai-149 nomad[10066]: 2023-02-22T11:19:39.668+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:40 ai-149 nomad[10066]: 2023-02-22T11:19:40.668+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:41 ai-149 nomad[10066]: 2023-02-22T11:19:41.669+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:42 ai-149 nomad[10066]: 2023-02-22T11:19:42.669+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:43 ai-149 nomad[10066]: 2023-02-22T11:19:43.670+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:44 ai-149 nomad[10066]: 2023-02-22T11:19:44.670+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:45 ai-149 nomad[10066]: 2023-02-22T11:19:45.671+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:46 ai-149 nomad[10066]: 2023-02-22T11:19:46.671+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:47 ai-149 nomad[10066]: 2023-02-22T11:19:47.671+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:48 ai-149 nomad[10066]: 2023-02-22T11:19:48.672+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:49 ai-149 nomad[10066]: 2023-02-22T11:19:49.672+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:50 ai-149 nomad[10066]: 2023-02-22T11:19:50.673+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:51 ai-149 nomad[10066]: 2023-02-22T11:19:51.673+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:52 ai-149 nomad[10066]: 2023-02-22T11:19:52.673+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:53 ai-149 nomad[10066]: 2023-02-22T11:19:53.673+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:54 ai-149 nomad[10066]: 2023-02-22T11:19:54.673+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:55 ai-149 nomad[10066]: 2023-02-22T11:19:55.674+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:56 ai-149 nomad[10066]: 2023-02-22T11:19:56.674+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:57 ai-149 nomad[10066]: 2023-02-22T11:19:57.675+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:58 ai-149 nomad[10066]: 2023-02-22T11:19:58.675+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:59 ai-149 nomad[10066]: 2023-02-22T11:19:59.675+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:00 ai-149 nomad[10066]: 2023-02-22T11:20:00.676+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:01 ai-149 nomad[10066]: 2023-02-22T11:20:01.676+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:02 ai-149 nomad[10066]: 2023-02-22T11:20:02.677+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:03 ai-149 nomad[10066]: 2023-02-22T11:20:03.677+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:04 ai-149 nomad[10066]: 2023-02-22T11:20:04.678+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:05 ai-149 nomad[10066]: 2023-02-22T11:20:05.678+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:06 ai-149 nomad[10066]: 2023-02-22T11:20:06.679+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:07 ai-149 nomad[10066]: 2023-02-22T11:20:07.680+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:08 ai-149 nomad[10066]: 2023-02-22T11:20:08.680+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:09 ai-149 nomad[10066]: 2023-02-22T11:20:09.681+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:10 ai-149 nomad[10066]: 2023-02-22T11:20:10.681+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:11 ai-149 nomad[10066]: 2023-02-22T11:20:11.682+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:12 ai-149 nomad[10066]: 2023-02-22T11:20:12.682+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1

job.hcl

job "demo" {
  datacenters = ["dc1"]
  type        = "service"

  update {
    max_parallel     = 1
    min_healthy_time = "10s"
    healthy_deadline = "3m"
    auto_revert      = true
    auto_promote     = true
    canary           = 1
  }

  reschedule {
    attempts       = 15
    interval       = "1h"
    delay          = "15s"
    delay_function = "exponential"
    max_delay      = "120s"
    unlimited      = false
  }

  group "service" {

    scaling {
      enabled = true
      min     = 1
      max     = 3
    }

    restart {
      interval = "3m"
      attempts = 3
      delay    = "15s"
      mode     = "delay"
    }

    network {
      port "http" { to = 8080 }
    }

    service {
      name         = "test"
      port         = "http"
      address_mode = "host"
      tags = [ ]
      provider     = "nomad"
      check {
        type           = "http"
        port           = "http"
        path           = "/"
        interval       = "12s"
        timeout        = "6s"

        check_restart {
          limit           = 3
          grace           = "10s"
        }
      }
    }

    task "app" {
      driver = "docker"
      config {
        image   = "...."
        command = "python"
        ports   = ["http"]
        args = [

        ]
      }
    }
  }
}

shoenig commented 1 year ago

Thanks for reporting @chenjpu! Indeed I was able to reproduce this, but with the service belonging to a task rather than the group - any chance your job was actually configured that way too?

Fix in https://github.com/hashicorp/nomad/pull/16240

chenjpu commented 1 year ago

Thanks for reporting @chenjpu! Indeed I was able to reproduce this, but with the service belonging to a task rather than the group - any chance your job was actually configured that way too?

Fix in #16240 Yes, my service is configured on the goup

chenjpu commented 1 year ago

@shoenig
When I upgraded to 1.5.0-rc1, the above problems still remained

shoenig commented 1 year ago

Whelp I guess there was more than one bug then :grimacing:

I'll try and reproduce again with a job structured exactly like yours @chenjpu

seanamos commented 1 year ago

I am experiencing a tsunami of these logs with Nomad 1.6.1. It's also generating a large amount of request error logs in Consul.

Example consul log:

/v1/agent/check/register ServiceID "_nomad-task-9ec1786d-e9b1-db33-a381-c748bf033163-group-xxx-xxx-xxx-xxx-xxx-xxx-80" does not exist

This seems to have started with a task that failed to start due to a missing vault secret. It's now generating hundreds of thousands of logs between nomad and consul.

The job itself has two groups, each with different vault policies. Each group contains 1 service, 1 task. Only the second group has a service check, it is this check that is causing all the errors.

The issue causing the task to fail has been resolved, however, the log tsunami persists.

In the short term, is there a workaround to get the logs to stop, short of restoring to an older snapshot?

seanamos commented 1 year ago

For others running into this and need a quick way to stop the influx of logs, restart the nomad client.

For us, we ran an instance refresh across the affected autoscaling group.

lattwood commented 4 months ago

We're also running into this.

hashicorp / nomad