hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.93k stars 1.96k forks source link

system job leaks service registration #9360

Closed fredwangwang closed 3 years ago

fredwangwang commented 4 years ago

Nomad version

0.12.5

Operating system and Environment details

linux + Consul 1.8.3

Issue

tested: When removing the additional service stanzas from system type jobs and redeploy, the service registration leaks (not removed from Consul).

Reproduction steps

  1. Deploy the job file below
  2. Remove the following
    -    service {
    -      name = "example-service-2"
    -      port = "api"
    -    }
  3. Redeploy the job file
  4. The service registration still in Consul, and will remain there even after the job is stopped and gc-ed.

Job file (if appropriate)

job "example-system-job" {
  region      = "global"
  datacenters = ["dc1"]
  type        = "system"

  group "example-service-api-group" {
    count = 1

    network {
      mode = "bridge"
      port api  {}
    }

    service {
      name = "example-service-2"
      port = "api"
    }

    service {
      name = "example-service"
      port = "api"

      # this check seems have to be here, otherwise it will not trigger this bug.
      check {
        type     = "http"
        port     = "api"
        path     = "/ready"
        interval = "5s"
        timeout  = "1s"
      }
    }

    task "example-fake" {
      driver = "docker"

      config {
        image = "nicholasjackson/fake-service:v0.12.0"
      }

      env {
        LISTEN_ADDR = "0.0.0.0:${NOMAD_PORT_api}"
      }
    }
  }
}
drewbailey commented 3 years ago

@fredwangwang Thank you for reporting, sorry you are running into this. I've tracked the issue down to the scheduler, when an inplace update is made (like removing or updating a service block) the allocations shared (group) ports are dropped. I've created https://github.com/hashicorp/nomad/issues/9735 to track and will have a fix soon.

fredwangwang commented 3 years ago

@drewbailey thanks for investigating!

github-actions[bot] commented 2 years ago

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.