hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.87k stars 1.95k forks source link

[beta] Transparent proxy jobs can be scheduled on nodes without transparent proxy (ie. older versions) #20614

Closed awanaut closed 5 months ago

awanaut commented 5 months ago

Nomad version

Output from nomad version mix of 1.8-beta+ent and 1.7.4+ent

Operating system and Environment details

Debian 12 client nodes with Nomad 1.8-beta+ent and Consul 1.17.2 Debian 12 client nodes with Nomad 1.7.4+ent and Consul 1.17.2

Issue

When scheduling a job that contains the new transparent_proxy{} block, it seems Nomad will ignore that attribute and possibly schedule it on non-transparent proxy nodes. In my case it would be client nodes that are version 1.7.4. To workaround, I created a constraint for nomad.version to ensure it's schedule on the correct node.

Reproduction steps

  1. Add the following to a group:
connect {
        sidecar_service {
          proxy {
            transparent_proxy {}
          }
        }  
      }
  1. I either use a constraint to force it upon a host that is older than 1.8 or I set the 1.8 node to ineligible.
  2. Submit job. Nomad schedules it and tells me the allocation is healthy.

Expected Result

I'm sure I could add in a health check, but I would expect Nomad to read the transparent_proxy{} block and read the attributes of the nodes before making the scheduling decision just like the other attributes.

Actual Result

Nomad will schedule transparent_proxy jobs on nodes without transparent proxy

Job file (if appropriate)

job "downstream" {
  datacenters = ["lab"]

  group "downstream" {
    count = 1

    network {
      port "expose" {}
    }        

    service {
      name = "downstream"
      port = "9090"

      check {
        expose   = true
        type     = "http"
        path     = "/health"
        interval = "30s"
        timeout  = "5s"

      }

      connect {
        sidecar_service {
          proxy {
            transparent_proxy {}                                 
          }
        }  
      }
    }          

    task "downstream" {
      driver = "docker"

      config {
        image = "nicholasjackson/fake-service:v0.26.2"  
      }

      env {
        NAME = "downstream"
        UPSTREAM_URIS = "http://upstream.virtual.consul"
      }                   
    }
  }
}
tgross commented 5 months ago

Hi @awanaut! Thanks for this report! I added constraints for the CNI plugin in https://github.com/hashicorp/nomad/pull/20244 but you're right that doesn't restrict the client version appropriately. I'll get this fixed for the final release.

tgross commented 5 months ago

Fix is up in #20623