hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
15k stars 1.96k forks source link

Nomad can not use consul ingress-gateways because tasks use protocol tcp #8647

Open spuder opened 4 years ago

spuder commented 4 years ago

Nomad 0.11.1
Consul 1.8.2

Consul Ingress-Gateways support tcp and http listeners. Http listeners are preferred because they allow for multiple services to listen on a single port and use Host header identification.

Problem

Nomad jobs default to service type of tcp. There does not appear to be a documented way to change a nomad job to use http as the service type. As a result the user will get the following error when they attempt to create a listener for it.

https://www.nomadproject.io/docs/job-specification/service

Error writing config entry ingress-gateway/ingress-ngproxy: Unexpected response code: 500 (rpc error making call: service "count-dashboard" has protocol "tcp", which does not match defined listener protocol "http")

Steps to reproduce

  1. Submit the standard count-dash example
count-dash.job

``` job "countdash" { datacenters = ["dc1"] group "api" { network { mode = "bridge" } service { name = "count-api" port = "9001" connect { sidecar_service {} } } task "web" { driver = "docker" config { image = "hashicorpnomad/counter-api:v1" } } } group "dashboard" { network { mode ="bridge" port "http" { static = 9002 to = 9002 } } service { name = "count-dashboard" port = "9002" # This is slightly modified from the stock count-dash examples # By adding an 'http' health check, the hope was to force nomad to use 'http' over 'tcp' check { name = "count-dashboard-health" type = "http" protocol = "http" path = "/health" port = 9002 interval = "10s" timeout = "5s" } connect { sidecar_service { proxy { upstreams { destination_name = "count-api" local_bind_port = 8080 } } } } } task "dashboard" { driver = "docker" env { COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}" } config { image = "hashicorpnomad/counter-dashboard:v1" } } } } ```

  1. Create an ingress controller and register it with consul config
consul config write ingress-service.hcl

Listeners = [
 {
   Port = 8080
   Protocol = "http"
   Services = [
     {
    Name = "count-dashboard",
        Hosts = ["count.example.com"]
   }
  ]
 }
]

Expected result

The service should be added to the ingress controller

Actual result

Consul throws this warning

Error writing config entry ingress-gateway/ingress-service: Unexpected response code: 500 (rpc error making call: service "count-dashboard" has protocol "tcp", which does not match defined listener protocol "http")
liemle3893 commented 4 years ago

How about adding below into count-dashboard:


sidecar_service {
  proxy {
    config {
      protocol = "http"
    }
  }
}
spuder commented 4 years ago

Great idea. I tried setting protocol with no change in behavior

     service {
       name = "count-dashboard"
       port = "9002"
       check  {
         name = "count-dashboard-health"
         type = "http"
         protocol = "http"
         path = "/health"
         port = 8080
         interval = "10s"
         timeout = "5s"

       }
       connect {
         sidecar_service {
           proxy {
             config {
               protocol = "http"
             }
             upstreams {
               destination_name = "count-api"
               local_bind_port = 8080
             }
           }
         }
       }
     }

I believe this is the documentation page that lists the available config options https://www.consul.io/docs/connect/registration/sidecar-service

blake commented 4 years ago

By default services deployed within Consul service mesh are configured as tcp services. You can override this on a per-service basis by creating a service-defaults configuration entry, or at the global level by creating a proxy-defaults entry.

Any services you wish to associate with an ingress gateway listener must previously be configured to use the same protocol as that listener prior to association, otherwise a configuration error will be returned.

apollo13 commented 4 years ago

Hi @spuder there might be some overlap with https://github.com/hashicorp/nomad/issues/8294#issuecomment-659873302 -- apparently Michael got it working there

spuder commented 4 years ago

Good suggestions. I've modified the job to use connect.sidecar_service.proxy.config.protocol=http and connect.sidecar_service.proxy.local_service_port=9002, however I am still unable to register this service in the load balancer as HTTP

connect {
         sidecar_service {
           proxy {
             config {
               protocol = "http"
             }
             local_service_port = 9002
             upstreams {
               destination_name = "count-api"
               local_bind_port = 8080
             }
           }
         }
       }
countdash.job

``` job "countdash" { datacenters = ["dc1"] group "api" { network { mode = "bridge" } service { name = "count-api" port = "9001" connect { sidecar_service {} } } task "web" { driver = "docker" config { image = "hashicorpnomad/counter-api:v1" } } } group "dashboard" { network { mode ="bridge" port "http" { static = 9002 to = 9002 } } service { name = "count-dashboard" port = "9002" check { name = "count-dashboard-health" type = "http" protocol = "http" path = "/health" port = 9002 interval = "10s" timeout = "5s" } connect { sidecar_service { proxy { config { protocol = "http" } local_service_port = 9002 upstreams { destination_name = "count-api" local_bind_port = 8080 } } } } } task "dashboard" { driver = "docker" env { COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}" } config { image = "hashicorpnomad/counter-dashboard:v1" } } } } ```

I still am seeing this error when I attempt to register the service

Error writing config entry ingress-gateway/ingress-ngproxy: Unexpected response code: 500 (rpc error making call: service "count-dashboard" has protocol "tcp", which does not match defined listener protocol "http")

I've ensured that the job is completely stoped and the service is eliminated from consul before submitting the job again.

Lucretius commented 4 years ago

@spuder

I was running into this exact same issue and finally was able to get this working by creating a "service-defaults" config entry in Consul (not through Nomad), with the same name as the service - and specifying the protocol there

For example, for your above service (assume it is named "web")

resource "consul_config_entry" "web" {
  kind = "service-defaults"
  name = "web"

  config_json = jsonencode({
    Protocol : "http"
  })
}

Stop and start your job in Nomad, then try registering the gateway.

It is unfortunate that when registering the service, Consul seems to ignore the specification inside the Nomad Connect proxy config stanza making it impossible to accomplish this in a single Nomad configuration . I dug through the source code a little but was unable to find anything that stood out as to why that is. Seems like a bug but this should provide a workaround in the meantime.

apollo13 commented 4 years ago

Mhm, so while the ingress gateway in 0.12.4 seems to work nicely for tcp services (just tried :D) it seems to fail rather horribly for http. Do we miss something @shoenig or is there simply no support for it yet?

shoenig commented 4 years ago

What's the problem you're seeing @apollo13 ? Using the http protocol should work, though you do still have to configure the service default setting the protocol to http before Consul will accept the config entry.

apollo13 commented 4 years ago

@shoenig, exactly. I was just wondering if there is already something builtin in Nomad to set the type (not that we miss something :))

shoenig commented 4 years ago

I think we'll publish a learn guide in the near future detailing the ins-and-outs of running Gateways in Nomad. For now though, here's a little example I've been using:

set service defaults

$ cat ig-service-defaults.json
{
    "Kind": "service-defaults",
    "Name": "uuid-api",
    "Protocol": "http"
}
consul config write ig-service-defaults.json

example job file

# $ cat ig-http.nomad

job "ig-http" {

  datacenters = ["dc1"]

  group "ingress-group" {

    network {
      mode = "bridge"
      port "inbound" {
        static = 8080
        to     = 8080
      }
    }

    service {
      name = "my-ingress-service"
      port = "8080"

      connect {
    gateway {
      proxy {
        connect_timeout = "500ms"
      }
      ingress {
            listener {
              port     = 8080
              protocol = "http"
              service {
        name = "uuid-api"
        hosts = ["example.com", "example.com:8080"]
              }
            }
          }
        }
      }
    }
  }

  group "generator" {
    network {
      mode = "host"
      port "api" {}
    }

    service {
      name = "uuid-api"
      port = "${NOMAD_PORT_api}"

      connect {
        native = true
      }
    }

    task "generate" {
      driver = "docker"

      config {
        image        = "hashicorpnomad/uuid-api:v3"
        network_mode = "host"
      }

      env {
        BIND = "0.0.0.0"
        PORT = "${NOMAD_PORT_api}"
      }
    }
  }
}
$ nomad job run ig-http.nomad

inspect

consul config read -kind ingress-gateway -name my-ingress-service
<our config entry>
curl -H "Host: example.com"  $(dig +short @127.0.0.1 -p 8600 uuid-api.ingress.dc1.consul. ANY):8080
3a9faa28-36bf-46c3-8274-be1c6f0a1978
apollo13 commented 4 years ago

Thanks, do you think setting the service type/proto directly in nomad would be in scope in the future, or is that something out of scope for nomad totally?

shoenig commented 4 years ago

It might be possible in the future. We shied away from managing anything but the ingress-gateway config entry type for now because there are issues around the multi-writer problem implied in how Consul makes config entries global in scope. Individual [OSS] Nomad clusters don't communicate with one another, so it's kinda sketchy to be writing config entries from Nomad. We rationalized it's fine for ingress-gateway entries, since it's probably a bug to be trying to define different IGCE's for the same service name regardless of which Nomad cluster it's coming from. But we didn't want to push that rationalization any further than necessary, and so at least for now service defaults still need to be set in Consul out of band from Nomad.

We've discussed internally some possible mechanics Consul can provide to improve the multi-writer story - if that stuff gets implemented then I don't see why Nomad couldn't make use of it. If you don't mind opening a ticket describing your use case, that would definitely help us gauge the interest for that feature.

apollo13 commented 4 years ago

Oh, thank you for the extensive explanation. I didn't realize that service defaults is the only way to set the protocol. I thought it would be possible to do that during service registration (I never looked that closely at consul aside from it's nomad integration).

Not sure if a new ticket makes sense; the use-case is simply providing an ingress gateway to the outside world (well mostly internal infra) so everything is encrypted. As it stands currently most people use traefik or so but then (usually) the traffic between traefik and the services is not encrypted. That said traefik just laid the groundwork to support connect services https://github.com/containous/traefik/commit/76f42a301382db52968c6ff7d1f4b3942dfcf50b

EDIT:// To be expand on the usecase a bit: When I said "simply" I ment something along the lines of "people can just submit a job to nomad and the rest will be taken care of". Ie they shouldn't have to know about specific consul quirks for configuration.

mister2d commented 4 years ago

@shoenig Is it possible to gauge interest by simple a thumbs up? I just want to submit a Nomad job that configures an ingress controller to route to internal Connect services via HTTP header.

The use case is very simple and not new conceptually. I've had this methodology working in Docker Swarm for about 3 years and would like to finalize the transition over to Nomad + Consul Connect.

tunhvn commented 4 years ago

I have the same issue with proxy config stanza in Nomad. I setted protocol = "http" but it seems Consul ignored this config. How can I set default HTTP for all services?

3nprob commented 3 years ago

since it's probably a bug to be trying to define different IGCE's for the same service name regardless of which Nomad cluster it's coming from

Counter-example if I understand it right: There are many P2P applications (notably Ethereum) that suppose publicly reachable TCP and UDP on the same port. Today Nomad doesn't have that distinction - specifying a port with a service for the port means both TCP and UDP.

In that scenario one would need two ingresses to the same service, unless I'm misisng something.

spuder commented 3 years ago

For future reference, here is our current work around

  1. Create a service-defaults with protocol http
  2. Create an ingress proxy with protocol http

Here is an example of how you may configure consul using terraform

resource "consul_config_entry" "ingress-example" {
  name       = "ingress-example"
  kind       = "ingress-gateway"
  depends_on = [consul_config_entry.foo, consul_config_entry.bar ] # <- Note this sets the resource in the proper order
  config_json = jsonencode({
    Listeners = [{
      Port     = 8080
      Protocol = "http"
      Services = [
        {
          Name  = "foo"
          Hosts = ["foo.example.com"]
        },
        {
          Name  = "bar"
          Hosts = ["bar.example.com"]
        }]
    }]
  })

}

resource "consul_config_entry" "foo" {
  name = "foo"
  kind = "service-defaults"

  config_json = jsonencode({
    Protocol = "http"
  })
}

resource "consul_config_entry" "bar" {
  name = "bar"
  kind = "service-defaults"

  config_json = jsonencode({
    Protocol = "http"
  })
}

Note that if you try and change this on a running service, you will get an error because the service will already have the default type of tcp. The work around is to create this consul config before deploying a job with nomad. Or atleast stopping the nomad job, creating these configs, then starting the nomad job back up.

paladin-devops commented 3 years ago

Any update on this issue? Like others in this thread I would also like to avoid making updates in Consul directly, outside of the Nomad job.

shoenig commented 2 years ago

Now that Consul versions pre-dating ConfigEntry Meta fields have been phased out, it might be reasonable to have Nomad do something clever with regard to automatically managing the prerequisite service-defaults ConfigEntry with Protocol set to the associated ingress.listener.protocol value for each enumerated service. The idea being Nomad only upserts the service-defaults ConfigEntry if and only if an existing ConfigEntry for the service contains a nomad_managed: true meta field (or doesn't exist yet), avoiding overwriting a ConfigEntry created outside of Nomad. These service-default ConfigEntry's would be created on submission of the job containing the ingress gateway definition, along side the ingress-gateway ConfigEntry Nomad already creates.

The global nature of ConfigEntry still implies each discrete Nomad cluster would be re-upserting the same service-defaults ConfigEntry for each service, as is already the case for ingress-gateway ConfigEntry.

Certainly open to feedback!

josegonzalez commented 2 years ago

@shoenig that certainly seems more than fine for us at SeatGeek.

At the moment, we have to configure this and a ServiceResolver separately from registering a Nomad job during a deploy, meaning much more coordination for something that is more or less a unit of work (registering a service against Consul for service discovery). Our initial use case is exactly setting the service protocol correctly so that Consul Connect does the right thing at the proxy level, though there are other things we want to configure as we expand our Consul Connect adoption past the initial phase.

It would even be fine if the logic here was gated behind some sort of beta/technical preview advisory, as has been done with CSI or Remote Task Drivers.

tgross commented 2 years ago

See also https://github.com/hashicorp/nomad/issues/14802 for an example of the challenges around updating the configuration.

suikast42 commented 2 years ago

Any progress in this issue? That's realy a mess to maintain the services in that way if you want introduce distrubuted tracing over envoy

https://github.com/hashicorp/consul/issues/15515

thnee commented 1 year ago

For posterity, this is how to set the proxy defaults as a Terraform resource. Which is what was needed to make services work with Consul API Gateway.

resource "consul_config_entry" "proxy_defaults_global" {
  kind = "proxy-defaults"
  name = "global"

  config_json = jsonencode({
    Config = {
      protocol = "http"
    }
  })
}
ehsannm commented 1 month ago

any update ?