hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.92k stars 1.95k forks source link

nomad-sd: support cross-namespace service lookups within templates #14177

Open jrasell opened 2 years ago

jrasell commented 2 years ago

Proposal

Allow Nomad template blocks to perform cross namespace service lookups when using the nomadService and nomadServices template functions. Currently these lookups are restricted to the namespace within which the job running the template is running.

Use-cases

Jobs may be separated by namespace for management purposes, however, it is likely they are allowed to discover and communicate with each other.

Please see https://github.com/hashicorp/nomad/issues/12589 for additional comments.

Attempted Solutions

There is no current workaround for this expect to run jobs that need to discover each other within the same namespace. The initial implementation did not have the scope to solve the problem of how to authorise individual tasks to perform cross-namespace lookups.

mhehle commented 1 year ago

we are also waiting for this feature and support this proposal.

our use case: We develop a workflow engine based on nomad. Each workflow is executed in its own namespace, but individual tasks in the workflow should be able to consume services from other namespaces.

SamMousa commented 1 year ago

I've found a workaround using traefik and its support for nomad service discovery. First of all note that the workload identity by default can already enumerate services in other namespaces. So what I did was this (nothing is redacted, this all runs private networks):

job "cluster-services-proxy" {
  namespace = "prod"
  region      = "global"
  datacenters = ["dc1"]
  type        = "service"

  update {
    max_parallel      = 1
    health_check      = "checks"
    min_healthy_time  = "10s"
    healthy_deadline  = "1m"
    progress_deadline = "3m"
    auto_revert       = true
    auto_promote      = false
    canary            = 0
    stagger           = "10s"
  }

  group "proxy" {
    count = 3
    network {
      port "gelf" {
        to = 5555
        host_network = "default"
      }
      port "http" {
        to = 80
        host_network = "default"
      }
    }

    task "traefik" {
      service {
        name = "cluster-services-graylog"
        provider = "nomad"
        port = "gelf"
      }

      service {
        name = "cluster-services-proxy"
        provider = "nomad"
        port = "http"

        check {
          type     = "http"
          port     = "http"
          path     = "/ping"
          interval = "5s"
          timeout  = "1s"
          method   = "GET"
        }

      }

      identity {
        env = true
      }
      driver = "docker"

      config {
        image = "traefik:v2.10"
        extra_hosts = [
          "host.docker.internal:host-gateway"
        ]
        hostname = "${NOMAD_JOB_NAME}-${NOMAD_ALLOC_ID}"

        ports = ["http", "gelf"]
        command = "--configfile=/${NOMAD_SECRETS_DIR}/traefik.yaml"
      }
      template {
        data = <<EOF
entryPoints:
  web:
    address: ':80'
  tcp:
    address: ':5555'
api:
  dashboard: true
  insecure: false
ping:
  entryPoint: web
providers:
  nomad:
    exposedByDefault: false
    endpoint:
      address: http://host.docker.internal:4646
      token: {{ env "NOMAD_TOKEN" }}
    namespaces:
      - cluster
    constraints: "Tag(`cluster-service`)"
  file:
    directory: /local/dynamic
EOF
        destination = "${NOMAD_SECRETS_DIR}/traefik.yaml"
        change_mode = "restart"
      }

      template {
        data = <<EOF
http:
  routers:
    dashboard:
      rule: PathPrefix(`/`)
      service: dashboard@internal
      entrypoints:
        - web
    api:
      rule: PathPrefix(`/api`)
      service: api@internal
      entrypoints:
        - web
EOF
        destination = "local/dynamic/dynamic.yaml"
        change_mode = "noop"
      }
      resources {
        cpu    = 100
        memory = 256
      }
    }
  }
}

This exposes HTTP services from the namespace cluster that have a proper tag set (cluster-service). In this specific case I needed a TCP router as well, since these require dedicated ports I could not expose them automatically; I had to create an explicit entrypoint for it. The idea is that you run this traefik job in any namespace that needs access to cluster services. In the cluster namespace my service looks like this; it has the normal traefik tags and one additionally so it is picked up by the traefik in the different namespace. Note that this tagging was needed in my case because I have another traefik job in the cluster namespace that is using autodiscovery as well; by using the tag I can exclude it from that traefik instance.

service {
      name = "graylog-gelftcp"
      port = "gelftcp"
      provider = "nomad"
      tags = [
        "cluster-service",
        "traefik.enable=true",
        "traefik.tcp.routers.gelftcp.entryPoints=tcp",
        "traefik.tcp.routers.gelftcp.rule=HostSNI(`*`)",
        ]
      check {
        type     = "tcp"
        interval = "5s"
        timeout  = "1s"
      }
    }
the-maldridge commented 10 months ago

I just ran head-first into this trying to migrate from consul service discovery to nomad service discovery for my web routing layer. I'm looking to migrate away from consul for most service discovery tasks, but not being able to have both a consul and nomad service registered for the same group means I can't even begin this migration until template blocks are able to cross-resolve services (frustrating arbitrary limitation since I could otherwise begin moving things like prometheus to use nomad service discovery via API while waiting on template support for my nginx workers).