hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.57k stars 1.92k forks source link

Ingress gateways return 503 when forwarding traffic to service where host and container ports don't match #20540

Open brian-athinkingape opened 2 months ago

brian-athinkingape commented 2 months ago

Nomad version

Nomad v1.5.10 BuildDate 2023-10-30T13:26:22Z Revision 3d7f65f481c5b263d6c82f03862c27447cf1794b

Consul version

Consul v1.14.11 Revision c0c5688c Build Date 2023-10-31T13:58:53Z Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

Docker version

Docker version 26.1.1, build 4cf5afa

Operating system and Environment details

Ubuntu 22.04.4 LTS (fresh install using AWS image) AWS c6a.xlarge

Issue

I'm running a setup with Nomad + Consul Connect (I'm providing a simplified test case of the problems we're encountering in our actual systems). I'm trying to set up a service running in Docker, listening on some port (let's say that we can't customize it for some reason; in this example it's Flask listening at port 5000). I also want to set up an ingress gateway to forward requests to that service (and have it listen on port 5555).

If I set up the Flask container with a port where to = 5000, then my ingress gateway fails even if the container is running. If I also set static = 5000 on the Flask port, then everything works fine. However, I can't set that in production, since there will be multiple copies of the container running on a server.

Reproduction steps

Run the two job files specified below. My server is running at 10.16.0.151. When I run curl http://10.16.0.151:<dynamic port allocated by Nomad to the Flask container> I get a 200 response with a body of Hello, World! as expected. However, running curl to the static ingress port does not give me the correct behaviour.

Expected Result

When I run curl http://10.16.0.151:5555 I should also get a 200 response with a body of Hello, World!.

Actual Result

When I run curl http://10.16.0.151:5555 I get a 503 response with a body of upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: delayed connect error: 111.

However, if I uncomment the # static = 5000 line in the Flask file, then the host port and container port match (both = 5000), and curling the ingress container returns the expected 200 response.

Job file (if appropriate)

Nomad config

# nomad 151
name = "testing151"

datacenter = "testing"
region = "testing"

log_file = "/var/log/nomad.log"

data_dir = "/opt/nomad/data"

bind_addr = "0.0.0.0"

advertise {
    http = "10.16.0.151"
    rpc = "10.16.0.151"
    serf = "10.16.0.151"
}

server {
    enabled = true

    raft_protocol = 3

    server_join {
        retry_max = 3
        retry_interval = "15s"
        retry_join = [ "10.16.0.151" ]
    }

    default_scheduler_config {
        scheduler_algorithm = "spread"
        memory_oversubscription_enabled = "true"
    }

    bootstrap_expect = 1
}

client {
    enabled = true

    reserved {
        memory = 2048
    }

    host_network "default" {
        cidr = "10.16.0.0/16"
    }

    host_network "loopback" {
        cidr = "127.0.0.1/32"
    }

    host_network "docker" {
        cidr = "172.17.0.1/32"
    }

    max_kill_timeout = "60s"

    meta {
        connect.sidecar_image = "envoyproxy/envoy:v1.24.12"
        connect.gateway_image = "envoyproxy/envoy:v1.24.12"
    }

    servers = [ "10.16.0.151" ]
}

consul {
    address = "127.0.0.1:8500"
}

log_rotate_duration = "24h"
log_rotate_max_files = 14
log_rotate_bytes = 10485760
log_level = "WARN"

Consul config

# consul 151
node_name = "testing151"

datacenter = "testing"

log_file = "/var/log/consul.log"
log_rotate_duration = "24h"
log_rotate_max_files = 14
log_rotate_bytes = 10485760
log_level = "WARN"

data_dir = "/opt/consul/data"

bind_addr = "0.0.0.0"
advertise_addr = "10.16.0.151"

client_addr = "10.16.0.151 127.0.0.1"

bootstrap_expect = 1
retry_join = [ "10.16.0.151" ]

server = true

enable_local_script_checks = true

connect {
    enabled = true
}

ports {
    grpc = 8502
    grpc_tls = -1
}

config_entries {
    bootstrap {
        kind = "proxy-defaults"
        name = "global"

        config {
            protocol = "http"
            local_request_timeout_ms = 0
        }
    }
}

ui_config {
    enabled = true
}

Flask job

job "flask" {
    region = "testing"
    datacenters = ["testing"]
    group "flask" {
        count = 1
        network {
            mode = "bridge"
            port "fivethousand" {
                # static = 5000 # If I set this port then everything works as expected
                to = 5000
                host_network = "default"
            }
        }
        service {
            name = "flask"
            port = "fivethousand"
            connect {
                sidecar_service {}
            }
            check {
                name = "Healthcheck"
                type = "http"
                path = "/"
                interval = "10s"
                timeout = "2s"
            }
        }
        task "service" {
            driver = "docker"
            config {
                # Python process listens on port 5000
                # Using this demo container:
                # https://github.com/do-community/k8s-intro-meetup-kit/tree/master/app
                image = "digitalocean/flask-helloworld"
            }
            resources {
                cpu = 500
                memory = 256
            }
            leader = true
        }
    }
}

Ingress job

job "flask-ingress" {
    region = "testing"
    datacenters = ["testing"]
    type = "system"
    group "flask-ingress" {
        network {
            mode = "bridge"
            port "default" {
                static = 5555
                host_network = "default"
            }
        }
        service {
            name = "flask-ingress"
            port = "5555"
            connect {
                gateway {
                    ingress {
                        listener {
                            port = 5555
                            protocol = "http"
                            service {
                                name = "flask"
                                hosts = ["*"]
                            }
                        }
                    }
                }
            }
        }
    }
}
tgross commented 1 week ago

Hi @brian-athinkingape! I've run the scenario you've shown above on the most recent versions of Nomad and Consul and I'm getting the same error when making a request to the ingress allocation's address. My slightly jobspecs below:

flask-ingress jobspec ```hcl job "flask" { group "flask" { network { mode = "bridge" port "fivethousand" { to = 5000 host_network = "default" } } service { name = "flask" port = "fivethousand" connect { sidecar_service {} } check { name = "Healthcheck" type = "http" path = "/" interval = "10s" timeout = "2s" } } task "service" { driver = "docker" config { # Python process listens on port 5000 # Using this demo container: # https://github.com/do-community/k8s-intro-meetup-kit/tree/master/app image = "digitalocean/flask-helloworld" } resources { cpu = 500 memory = 256 } } } } ```
flask-ingress jobspec ```hcl job "flask-ingress" { type = "system" group "flask-ingress" { network { mode = "bridge" port "default" { static = 5555 host_network = "default" } } service { name = "flask-ingress" port = "5555" connect { gateway { ingress { listener { port = 5555 protocol = "http" service { name = "flask" hosts = ["*"] } } } } } } } } ```
proxy defaults ```hcl kind = "proxy-defaults" name = "global" config { protocol = "http" local_request_timeout_ms = 0 } ```

Looking up the addresses for the two allocations:

$ nomad alloc status fc6eb399
ID                  = fc6eb399-b703-c241-2d71-cf860f3da995
Eval ID             = d161006e
Name                = flask.flask[0]
...
Allocation Addresses (mode = "bridge"):
Label                 Dynamic  Address
*fivethousand         yes      10.37.105.17:24757 -> 5000
*connect-proxy-flask  yes      10.37.105.17:26380 -> 26380

$ nomad alloc status aaa87a6f
ID                  = aaa87a6f-1456-6dbd-fdc5-1f9d51b4daf4
Eval ID             = eb714850
Name                = flask-ingress.flask-ingress[0]
...
Allocation Addresses (mode = "bridge"):
Label     Dynamic  Address
*default  yes      10.37.105.17:5555
...

Making the requests, we see that Flask is up and responding to requests, but the ingress proxy isn't wired up.

$ curl 10.37.105.17:24757
Hello, World!%

$ curl 10.37.105.17:5555
upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 111%

The error we're both seeing is coming from the Envoy proxy, and it's because Envoy is getting a ECONNREFUSED from the upstream service. Generally, you'll want to look at the Envoy Proxy Troubleshooting Guide for more details. It might also help to look at the Envoy bootstrap configuration files or task log files as described in service mesh troubleshooting.

In this case the "resolving common errors" troubleshooting guide doesn't have much for us, as the Envoy proxy sidecars are healthy. So I'll look at the Nomad troubleshooting guide and it tells me to check in the logs first. I looked at the ingress proxy's logs with nomad alloc logs -task connect-ingress-flask-ingress -stderr aaa8 and see the listener getting configured as I'd expect:

[2024-06-25 17:56:25.390][1][info][upstream] [source/extensions/listener_managers/listener_manager/lds_api.cc:99] lds: add/update listener 'http:0.0.0.0:5555'

The Envoy bootstrap command looks ok for the gateway as well, and the bootstrap logs are empty:

$ nomad alloc exec -task connect-ingress-flask-ingress aaa8 cat secrets/.envoy_bootstrap.cmd
connect envoy -grpc-addr unix://alloc/tmp/consul_grpc.sock -http-addr 127.0.0.1:8500 -admin-bind 127.0.0.2:19000 -address 127.0.0.1:19100 -proxy-id _nomad-task-aaa81c27-b0a4-8f0e-a12d-756a13ecef95-group-flask-ingress-flask-ingress-5555 -bootstrap -gateway ingress -token 1b9f4c91-6770-cb45-a445-601a0d2181c6

$ nomad alloc fs aaa8 alloc/logs/envoy_bootstrap.stderr.0

So then I move over to the Flask app's proxy logs with nomad alloc logs -task connect-proxy-flask -stderr fc6eb399

[2024-06-25 17:34:48.831][1][info][upstream] [source/extensions/listener_managers/listener_manager/lds_api.cc:99] lds: add/update listener 'public_listener:0.0.0.0:26380'

$ nomad alloc exec -task connect-proxy-flask fc6eb399 cat secrets/.envoy_bootstrap.cmd
connect envoy -grpc-addr unix://alloc/tmp/consul_grpc.sock -http-addr 127.0.0.1:8500 -admin-bind 127.0.0.2:19001 -address 127.0.0.1:19101 -proxy-id _nomad-task-fc6eb399-b703-c241-2d71-cf860f3da995-group-flask-flask-fivethousand-sidecar-proxy -bootstrap -token 69ad5818-841d-2ebc-f060-16c4ad930e6a

$ nomad alloc fs fc6eb399 alloc/logs/envoy_bootstrap.stderr.0

All the logs look fine. Next, let's check the ingress gateway config that's been written to Consul:

$ consul config read -kind ingress-gateway -name flask-ingress
{
    "Kind": "ingress-gateway",
    "Name": "flask-ingress",
    "Partition": "default",
    "Namespace": "default",
    "TLS": {
        "Enabled": false
    },
    "Listeners": [
        {
            "Port": 5555,
            "Protocol": "http",
            "Services": [
                {
                    "Name": "flask",
                    "Hosts": [
                        "*"
                    ],
                    "Namespace": "default",
                    "Partition": "default",
                    "TLS": {},
                    "RequestHeaders": {},
                    "ResponseHeaders": {}
                }
            ]
        }
    ],
    "CreateIndex": 114,
    "ModifyIndex": 114
}

Everything there looks ok. I then tried starting a local Connect proxy (with a Consul management token in my environment). I get the same error:

$ curl localhost:5555
upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 111%

But I can see in the proxy logs that I am connecting:

$ consul connect proxy -log-level trace -service flask-ingress -upstream flask:5555
==> Consul Connect proxy starting...
    Configuration mode: Flags
               Service: flask-ingress
              Upstream: flask => :5555
       Public listener: Disabled

==> Log data will now stream in as it occurs:

    2024-06-25T15:01:02.780-0400 [DEBUG] proxy: got new config
    2024-06-25T15:01:02.780-0400 [INFO]  proxy: Starting listener: listener=127.0.0.1:5555->service:default/default/flask bind_addr=127.0.0.1:5555
    2024-06-25T15:01:02.781-0400 [INFO]  proxy: Proxy loaded config and ready to serve
    2024-06-25T15:01:02.781-0400 [INFO]  proxy: Parsed TLS identity: uri=spiffe://58fa1480-b3d1-4ac8-6324-91e30ccf4099.consul/ns/default/dc/dc1/svc/flask-ingress
    2024-06-25T15:01:04.937-0400 [DEBUG] proxy.connect: resolved service instance: service=flask-ingress address=10.37.105.17:26380 identity=spiffe:///ns/default/dc/dc1/svc/flask
    2024-06-25T15:01:04.940-0400 [DEBUG] proxy.connect: successfully connected to service instance: service=flask-ingress address=10.37.105.17:26380 identity=spiffe:///ns/default/dc/dc1/svc/flask

That's puzzling, so I went over to the client and checked that we could see the Envoy process listening if we enter the network namespace of the allocation.

$ docker-net-nsenter c625946ec76a /bin/bash
root@nomad0:/home/ubuntu# netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.2:19001         0.0.0.0:*               LISTEN      4191/envoy
tcp        0      0 0.0.0.0:5000            0.0.0.0:*               LISTEN      4312/python
tcp        0      0 0.0.0.0:26380           0.0.0.0:*               LISTEN      4191/envoy
tcp        0      0 172.26.64.206:5000      10.37.105.17:50088      TIME_WAIT   -
tcp        0      0 172.26.64.206:5000      10.37.105.17:37200      TIME_WAIT   -
tcp        0      0 172.26.64.206:5000      10.37.105.17:50558      TIME_WAIT   -
tcp        0      0 172.26.64.206:5000      10.37.105.17:49502      TIME_WAIT   -
tcp        0      0 172.26.64.206:26380     172.26.64.1:48280       ESTABLISHED 4191/envoy
tcp        0      0 172.26.64.206:5000      10.37.105.17:41804      TIME_WAIT   -
tcp        0      0 172.26.64.206:5000      10.37.105.17:36892      TIME_WAIT   -

The little helper I'm using to attach to the pause container:

docker-net-nsenter ```bash #!/usr/bin/env bash set -eu ID=${1:-} shift NET_NS_PATH=$(docker inspect $ID --format '{{.NetworkSettings.SandboxKey}}') PID=$(docker inspect $ID --format '{{.State.Pid}}') sudo nsenter --net="$NET_NS_PATH" -t "$PID" $@ ```

If I then tcpdump -A inside the Flask application's network namespace and make a request to the ingress, I see not only the inbound request but the response from the Flask application:

14:35:21.210642 IP 10.37.105.1.41222 > 172.26.64.209.5555: Flags [S], seq 190714019, win 64240, options [mss 1460,sackOK,TS val 2466159760 ecr 0,nop,wscale 7], length 0

...

14:35:21.958615 IP 172.26.64.206.5000 > 10.37.105.17.57436: Flags [P.], seq 18:155, ack 153, win 508, options [nop,nop,TS val 3802047657 ecr 2982470331], length 137
E...."@.@.C...@.
%i....\..h._.>.....`......
........Content-Type: text/html; charset=utf-8
Content-Length: 13
Server: Werkzeug/0.16.1 Python/3.8.1
Date: Tue, 25 Jun 2024 18:35:21 GMT

So the request is coming through and the application is sending a response, but somehow it's not making it all the way back out!


@brian-athinkingape, at this point I'm quite stumped... I'm going to try to wrangle some help from the Consul folks to see if they have thoughts on where we might be going wrong.

Aside, I want to strongly recommend that you move off of the deprecated ingress gateway to API gateway. From the Consul docs:

Ingress gateway is deprecated and will not be enhanced beyond its current capabilities. Ingress gateway is fully supported in this version but will be removed in a future release of Consul.

We've got a tutorial on deploying API Gateway on Nomad with the new Workload Identity workflow here: https://developer.hashicorp.com/nomad/tutorials/integrate-consul/deploy-api-gateway-on-nomad