hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.57k stars 1.92k forks source link

Automatically assign local_service_port when using a dynamic port #20010

Closed Mongey closed 2 months ago

Mongey commented 4 months ago

Proposal

When given the group that defines a single dynamic port, and connect->sidecar_service Nomad should automatically set local_service_port to the correct port. Additionally, if there are multiple ports, and sidecar_service is set, nomad should warn that local_service_port is not set.

The default behaviour leads to Nomad & Consul Connect not working, and make it seem like something is broken. I was getting the following error, attempting to connect from one service to another.

upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: delayed connect error: 111

Additionally lots of documentation says, only use a static port in special uses cases, but for Consul Connect most examples are using static ports

https://github.com/hashicorp/nomad/blob/e8db58836843244b579b0ecdf8cc3bbcb3ce90eb/website/content/plugins/drivers/community/containerd.mdx#L330-L332

job "service-b" {
  group "service-b" {
    network {
      mode = "bridge"

      port "http" {
        to = 1027
      }
    }

    service {
      port = "http"
      name = "service-b"

      connect {
        sidecar_service {}
      }

      check {
        type     = "http"
        path     = "/"
      }
    }

    task "service-b" {
      driver = "docker"

      config {
        image = "traefik/whoami:latest"
        ports = ["http"]
        args  = ["--port", "${NOMAD_PORT_http}"]
      }
    }
  }
}

Use-cases

Using connect with a dynamic port job

https://github.com/hashicorp/nomad/issues/7229

Attempted Solutions

The current solution is to define connect->sidecar_service->proxy->local_service_port yourself.

job "service-b" {
  group "service-b" {
    network {
      mode = "bridge"

      port "http" {
        to = 1027
      }
    }

    service {
      port = "http"
      name = "service-b"

      connect {
        sidecar_service {
          proxy {
            local_service_port = 1027
          }
        }
      }

      check {
        type     = "http"
        path     = "/"
      }
    }

    task "service-b" {
      driver = "docker"

      config {
        image = "traefik/whoami:latest"
        force_pull  = true
        ports = ["http"]
        args  = ["--port", "${NOMAD_PORT_http}"]
      }
    }
  }
}
tgross commented 4 months ago

Hi @Mongey! This is definitely a rough spot in the documentation and implementation.

What you'll find if you look at the example Connect job you get from nomad job init -connect is that the service.port field is not set to a label but instead to the port number. This results in a working proxy without the need to set connect.sidecar_service.proxy.local_service_port except that if you have health checks you'll need to set check.expose on your health checks to ensure that the Envoy proxy has the correct exceptions so that Consul can reach the health check endpoint without having the certs needed for passing through Envoy to do it.

This definitely isn't ideal yet. One of my current projects is implementing Connect transparent proxy support (ref https://github.com/hashicorp/nomad/issues/10628), which should reduce some of the remaining manual knobs you need to adjust to make Connect work.

tgross commented 2 months ago

The work for transparent proxy has been merged and will ship in Nomad 1.8.0. This will allow using a Consul virtual service (by name), and make it so that you don't need to define upstream blocks at all. I'm going to close this issue out, as that's going to be our recommended approach.