Envoy proxy as main task doesn't work correctly in Consul transparent proxy mode

ruslan-y commented 2 months ago

Hi there!

I'm going to describe my problem in detail so a lot of text, logs and configs are expected)

Nomad version

Nomad v1.8.0
BuildDate 2024-05-28T17:38:17Z
Revision 28b82e4b2259fae5a62e2ed47395334bea5a24c4

Consul version

Consul v1.19.0
Revision bf0166d8
Build Date 2024-06-12T13:59:10Z

Operating system and Environment details

5.10.0-23-amd64 #1 SMP Debian 5.10.179-2 (2023-07-14) x86_64 GNU/Linux

Nomad client config

``` name = "host" region = "global" datacenter = "dc1" enable_debug = false disable_update_check = false bind_addr = "0.0.0.0" advertise { http = ":4646" rpc = ":4647" serf = ":4648" } ports { http = 4646 rpc = 4647 serf = 4648 } consul { address = "localhost:8500" ssl = false ca_file = "" grpc_ca_file = "" cert_file = "" key_file = "" token = "" server_service_name = "nomad-servers" client_service_name = "nomad-clients" tags = [] auto_advertise = true server_auto_join = true client_auto_join = true } data_dir = "/var/nomad" log_level = "INFO" enable_syslog = true leave_on_terminate = true leave_on_interrupt = false tls { http = true rpc = true ca_file = "/etc/nomad/ssl/nomad-ca.pem" cert_file = "/etc/nomad/ssl/client.pem" key_file = "/etc/nomad/ssl/client-key.pem" rpc_upgrade_mode = false verify_server_hostname = "true" verify_https_client = "true" } acl { enabled = true token_ttl = "30s" policy_ttl = "30s" replication_token = "" } vault { enabled = true address = "https://" allow_unauthenticated = true create_from_role = "nomad-cluster" task_token_ttl = "" ca_file = "" ca_path = "" cert_file = "" key_file = "" tls_server_name = "" tls_skip_verify = false namespace = "" } telemetry { disable_hostname = "true" collection_interval = "15s" use_node_name = "false" publish_allocation_metrics = "true" publish_node_metrics = "true" filter_default = "true" prefix_filter = [] disable_dispatched_job_summary_metrics = "false" statsite_address = "" statsd_address = "" datadog_address = "" datadog_tags = [] prometheus_metrics = "true" circonus_api_token = "" circonus_api_app = "nomad" circonus_api_url = "https://api.circonus.com/v2" circonus_submission_interval = "10s" circonus_submission_url = "" circonus_check_id = "" circonus_check_force_metric_activation = "false" circonus_check_instance_id = "" circonus_check_search_tag = "" circonus_check_display_name = "" circonus_check_tags = "" circonus_broker_id = "" circonus_broker_select_tag = "" } autopilot { cleanup_dead_servers = true last_contact_threshold = "1s" max_trailing_logs = 250 server_stabilization_time = "10s" } ```

Consul client config

``` { "acl": { "default_policy": "deny", "down_policy": "extend-cache", "enable_token_persistence": true, "enabled": true, "token_ttl": "30s", "tokens": { "agent": "", "agent_recovery": "" } }, "addresses": { "dns": "172.17.0.1", "grpc": "127.0.0.1", "grpc_tls": "127.0.0.1", "http": "127.0.0.1", "https": "127.0.0.1" }, "advertise_addr": "", "advertise_addr_wan": "", "auto_encrypt": { "tls": true }, "bind_addr": "", "client_addr": "127.0.0.1", "connect": { "enabled": true }, "data_dir": "/opt/consul", "datacenter": "dc1", "disable_update_check": false, "domain": "consul", "enable_local_script_checks": false, "enable_script_checks": false, "enable_syslog": true, "encrypt": "", "encrypt_verify_incoming": true, "encrypt_verify_outgoing": true, "limits": { "http_max_conns_per_client": 400, "rpc_max_conns_per_client": 200 }, "log_level": "INFO", "node_name": "host", "performance": { "leave_drain_time": "10s", "raft_multiplier": 1, "rpc_hold_timeout": "30s" }, "ports": { "dns": 8600, "grpc": 8502, "grpc_tls": 8503, "http": 8500, "https": -1, "serf_lan": 8301, "serf_wan": 8302, "server": 8300 }, "primary_datacenter": "dc1", "raft_protocol": 3, "recursors": [ "1.1.1.1", "8.8.8.8" ], "retry_interval": "30s", "retry_join": [ "", "", "" ], "retry_max": 0, "server": false, "syslog_facility": "local0", "tls": { "defaults": { "ca_file": "/etc/consul/ssl/consul-agent-ca.pem", "tls_min_version": "TLSv1_2", "verify_incoming": false, "verify_outgoing": true }, "https": { "verify_incoming": false }, "internal_rpc": { "verify_incoming": true, "verify_server_hostname": false } }, "translate_wan_addrs": false, "ui_config": { "enabled": false } } ```

Issue

There is a running nomad job (envoy proxy) in a cluster that proxies requests to my-service. I'm using consul connect upstreams in my nomad jobs and it works perfectly.

Example of my nomad job (upstreams):

``` job "test-proxy-job" { datacenters = ["dc1"] namespace = "test" type = "service" group "test-proxy-group" { count = 1 vault { policies = ["nomad-services"] } update { max_parallel = 1 canary = 1 auto_revert = true auto_promote = true min_healthy_time = "10s" healthy_deadline = "5m" progress_deadline = "15m" } network { mode = "bridge" port "http" { static = 28123 to = 28162 } } service { name = "test-proxy" port = "28162" tags = ["proxy", "test"] connect { sidecar_task { resources { cpu = 500 memory = 300 } config { args = [ "-c", "${NOMAD_SECRETS_DIR}/envoy_bootstrap.json", "-l", "debug", "--concurrency", "${meta.connect.proxy_concurrency}", "--disable-hot-restart" ] } } sidecar_service { proxy { upstreams { destination_name = "my-service" local_bind_port = 10007 } } } } } task "test-proxy-task" { driver = "docker" template { data = <

If I make a request there will be an expected response from the my-service:

curl -v -H 'x-test-envoy: true' http://<external_ip>:28123
*   Trying <external_ip>:28123...
* Connected to <external_ip> (<external_ip>) port 28123
> GET / HTTP/1.1
> Host: <external_ip>:28123
> User-Agent: curl/8.6.0
> Accept: */*
> x-test-envoy: true
>
< HTTP/1.1 200 OK
< server: envoy
< date: Thu, 04 Jul 2024 19:01:21 GMT
< content-type: application/json; charset=utf8
< content-length: 57
< x-envoy-upstream-service-time: 6
<
* Connection #0 to host <external_ip> left intact
{"jsonrpc": "2.0", "id": "test", "result": "ok"}%

When I'm trying to enable transparent proxy in the config it doesn't work

Example of my nomad job (transparent proxy):

``` job "test-proxy-job" { datacenters = ["dc1"] namespace = "test" type = "service" group "test-proxy-group" { count = 1 vault { policies = ["nomad-services"] } update { max_parallel = 1 canary = 1 auto_revert = true auto_promote = true min_healthy_time = "10s" healthy_deadline = "5m" progress_deadline = "15m" } network { mode = "bridge" port "http" { static = 28123 to = 28162 } } service { name = "test-proxy" port = "28162" tags = ["proxy", "test"] connect { sidecar_task { resources { cpu = 500 memory = 300 } config { args = [ "-c", "${NOMAD_SECRETS_DIR}/envoy_bootstrap.json", "-l", "debug", "--concurrency", "${meta.connect.proxy_concurrency}", "--disable-hot-restart" ] } } sidecar_service { proxy { transparent_proxy {} } } } } task "test-proxy-task" { driver = "docker" template { data = <

Requests don't go through, I'm getting 503 error. No response from the my-service:

curl -v -H 'x-test-envoy: true' http://<external_ip>:28123
*   Trying <external_ip>:28123...
* Connected to <external_ip> (<external_ip>) port 28123
> GET / HTTP/1.1
> Host: <external_ip>:28123
> User-Agent: curl/8.6.0
> Accept: */*
> x-test-envoy: true
>
< HTTP/1.1 503 Service Unavailable
< content-length: 91
< content-type: text/plain
< date: Thu, 04 Jul 2024 20:55:09 GMT
< server: envoy
<
* Connection #0 to host <external_ip> left intact
upstream connect error or disconnect/reset before headers. reset reason: connection failure%

Some debug logs from Envoy:

``` [2024-07-04 20:55:04.149][21][debug][conn_handler] [source/extensions/listener_managers/listener_manager/active_tcp_listener.cc:155] [C205] new connection from :44066 [2024-07-04 20:55:04.149][21][debug][http] [source/common/http/conn_manager_impl.cc:375] [C205] new stream [2024-07-04 20:55:04.149][21][debug][http] [source/common/http/conn_manager_impl.cc:1118] [C205][S13244765374556234856] request headers complete (end_stream=true): ':authority', ':28123' ':path', '/' ':method', 'GET' 'user-agent', 'curl/8.6.0' 'accept', '*/*' 'x-test-envoy', 'true' [2024-07-04 20:55:04.149][21][debug][http] [source/common/http/conn_manager_impl.cc:1101] [C205][S13244765374556234856] request end stream [2024-07-04 20:55:04.149][21][debug][connection] [./source/common/network/connection_impl.h:98] [C205] current connecting state: false [2024-07-04 20:55:04.149][21][debug][router] [source/common/router/router.cc:478] [C205][S13244765374556234856] cluster 'my-service' match for URL '/' [2024-07-04 20:55:04.149][21][debug][router] [source/common/router/router.cc:686] [C205][S13244765374556234856] router decoding headers: ':authority', ':28123' ':path', '/' ':method', 'GET' ':scheme', 'http' 'user-agent', 'curl/8.6.0' 'accept', '*/*' 'x-test-envoy', 'true' 'x-forwarded-proto', 'http' 'x-request-id', '' [2024-07-04 20:55:04.149][21][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0) [2024-07-04 20:55:04.149][21][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection [2024-07-04 20:55:04.149][21][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0) [2024-07-04 20:55:04.150][21][debug][connection] [./source/common/network/connection_impl.h:98] [C206] current connecting state: true [2024-07-04 20:55:04.150][21][debug][client] [source/common/http/codec_client.cc:57] [C206] connecting [2024-07-04 20:55:04.150][21][debug][connection] [source/common/network/connection_impl.cc:941] [C206] connecting to 240.0.41.1:80 [2024-07-04 20:55:04.150][21][debug][connection] [source/common/network/connection_impl.cc:960] [C206] connection in progress [2024-07-04 20:55:06.029][1][debug][main] [source/server/server.cc:265] flushing stats [2024-07-04 20:55:07.079][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:354] dns resolution for my-service.virtual.consul started [2024-07-04 20:55:07.081][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:275] dns resolution for my-service.virtual.consul completed with status 0 [2024-07-04 20:55:07.081][1][debug][upstream] [source/common/upstream/upstream_impl.cc:457] transport socket match, socket default selected for host with address 240.0.41.1:80 [2024-07-04 20:55:07.081][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:177] DNS refresh rate reset for my-service.virtual.consul, refresh rate 5000 ms [2024-07-04 20:55:08.321][15][debug][conn_handler] [source/extensions/listener_managers/listener_manager/active_tcp_listener.cc:155] [C207] new connection from 127.0.0.1:54136 [2024-07-04 20:55:08.322][15][debug][connection] [source/common/network/connection_impl.cc:656] [C207] remote close [2024-07-04 20:55:08.322][15][debug][connection] [source/common/network/connection_impl.cc:250] [C207] closing socket: 0 [2024-07-04 20:55:08.322][15][debug][conn_handler] [source/extensions/listener_managers/listener_manager/active_stream_listener_base.cc:121] [C207] adding to cleanup list [2024-07-04 20:55:09.148][21][debug][pool] [source/common/conn_pool/conn_pool_base.cc:793] [C206] connect timeout [2024-07-04 20:55:09.148][21][debug][connection] [source/common/network/connection_impl.cc:139] [C206] closing data_to_write=0 type=1 [2024-07-04 20:55:09.148][21][debug][connection] [source/common/network/connection_impl.cc:250] [C206] closing socket: 1 [2024-07-04 20:55:09.148][21][debug][client] [source/common/http/codec_client.cc:107] [C206] disconnect. resetting 0 pending requests [2024-07-04 20:55:09.148][21][debug][pool] [source/common/conn_pool/conn_pool_base.cc:484] [C206] client disconnected, failure reason: [2024-07-04 20:55:09.148][21][debug][router] [source/common/router/router.cc:1279] [C205][S13244765374556234856] upstream reset: reset reason: connection failure, transport failure reason: [2024-07-04 20:55:09.148][21][debug][http] [source/common/http/filter_manager.cc:996] [C205][S13244765374556234856] Sending local reply with details upstream_reset_before_response_started{connection_failure} [2024-07-04 20:55:09.148][21][debug][http] [source/common/http/conn_manager_impl.cc:1773] [C205][S13244765374556234856] encoding headers via codec (end_stream=false): ':status', '503' 'content-length', '91' 'content-type', 'text/plain' 'date', 'Thu, 04 Jul 2024 20:55:09 GMT' 'server', 'envoy' [2024-07-04 20:55:09.148][21][debug][http] [source/common/http/conn_manager_impl.cc:1865] [C205][S13244765374556234856] Codec completed encoding stream. [2024-07-04 20:55:09.148][21][debug][pool] [source/common/conn_pool/conn_pool_base.cc:454] invoking idle callbacks - is_draining_for_deletion_=false [2024-07-04 20:55:10.201][21][debug][connection] [source/common/network/connection_impl.cc:656] [C205] remote close [2024-07-04 20:55:10.201][21][debug][connection] [source/common/network/connection_impl.cc:250] [C205] closing socket: 0 [2024-07-04 20:55:10.201][21][debug][conn_handler] [source/extensions/listener_managers/listener_manager/active_stream_listener_base.cc:121] [C205] adding to cleanup list [2024-07-04 20:55:11.031][1][debug][main] [source/server/server.cc:265] flushing stats [2024-07-04 20:55:12.081][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:354] dns resolution for my-service.virtual.consul started [2024-07-04 20:55:12.083][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:275] dns resolution for my-service.virtual.consul completed with status 0 [2024-07-04 20:55:12.083][1][debug][upstream] [source/common/upstream/upstream_impl.cc:457] transport socket match, socket default selected for host with address 240.0.41.1:80 [2024-07-04 20:55:12.083][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:177] DNS refresh rate reset for my-service.virtual.consul, refresh rate 5000 ms ```

If you go inside the container Envoy proxy and try to sent a request locally (Envoy listener) you see same issue:

root@d87055ce8426:/# curl -v localhost:28162
*   Trying 127.0.0.1:28162...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 28162 (#0)
> GET / HTTP/1.1
> Host: localhost:28162
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< content-length: 91
< content-type: text/plain
< date: Thu, 04 Jul 2024 22:23:30 GMT
< server: envoy
< 
* Connection #0 to host localhost left intact
upstream connect error or disconnect/reset before headers. reset reason: connection failure

At the same time a request to my-service by consul name (and virtual IP) pass:

root@d87055ce8426:/# curl -v my-service.virtual.consul
*   Trying 240.0.41.1:80...
* TCP_NODELAY set
* Connected to my-service.virtual.consul (240.0.41.1) port 80 (#0)
> GET / HTTP/1.1
> Host: my-service.virtual.consul
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: fasthttp
< Date: Thu, 04 Jul 2024 22:24:07 GMT
< Content-Type: application/json; charset=utf8
< Content-Length: 57
< 
* Connection #0 to host my-service.virtual.consul left intact
{"jsonrpc": "2.0", "id": "test", "result": "ok"}

I also have three other services in the cluster running with transparent proxy and there is connectivity between them. So I guess the problem is with Envoy proxy (or my configuration of Envoy proxy) I tried different versions of Envoy, including the latest.

Any help would be appreciated.

ruslan-y commented 1 month ago

Updated Consul to version 1.19.1 and Nomad to version 1.8.2 Problem still exist.

tgross commented 1 month ago

Hi @ruslan-y! I'm having trouble understanding what you're trying to do here. You're running Envoy as the main task but you're also running with a Connect sidecar proxy? Ordinarily it shouldn't matter what the main task is, but in this case it's extra confusing because you're saying things like "these are the logs from Envoy" without clarifying if they're the logs from the main Envoy task you're starting or the sidecar.

It looks like from the request to the virtual IP that you've got the transparent proxy set up correctly, so that's a start. But I suspect the iptables rules that transparent proxy creates inside the container is going to interfere with this, especially the UID exception that's being carved out for Envoy. Both Envoy processes will have the same UID because they're the same image, so the sidecar Envoy will have an exception in the iptables rules but the main Envoy will not.

ruslan-y commented 1 month ago

Yes, these logs from the main Envoy task i'm starting.

tgross commented 1 month ago

Ok, I'll leave this open until you've had a chance to debug the other items I've suggested looking at.

ruslan-y commented 1 month ago

Hi, @tgross !

I tried changing uid in transparent_proxy to uid = 0 and tried to uid=102, the same issue

        sidecar_service {
          proxy {
            transparent_proxy {
              uid = 0
            }            
          }
        }

I also tried using the same image for the main envoy and for the sidecar.

image = "envoyproxy/envoy:v1.26.7"

It didn't produce any results. Any suggestions will be appreciate, thanks!

tgross commented 1 month ago

I tried changing uid in transparent_proxy to uid = 0 and tried to uid=102, the same issue

You would also have to change the UID of the container that's running for that to be effective. That field says "don't intercept this UID's traffic", but both Envoys are running with the same UID.

ruslan-y commented 1 month ago

I don't really understand, should I use different uids or the same ones?

I've changed the user of the container main task the same user

    task "test-proxy-task" {
      driver = "docker"
      user = "root"

and for the sidecar

        sidecar_service {
          proxy {
            transparent_proxy {
              uid = 0
            }            
          }
        }

And I tried using different uid and user also user = "root" , uid = 1001 it doesn't work for me

tgross commented 1 month ago

Transparent proxy mode creates iptables rules inside the container with a UID exception. So the processes both need to be running as separate UIDs, and the UID for the transparent_proxy.uid field must be the UID of the sidecar Envoy process. I don't think uid = 0 will work here either, as that's treated as a "zero value" by the jobspec parser and will likely just be ignored.

But I'm also still wondering why you're trying to do this in the first place?

betterthanbreakfast commented 1 month ago

@tgross GM

You're running Envoy as the main task but you're also running with a Connect sidecar proxy?

To make it clear, yes. We running EnvoyProxy as a main service in task definition (together with EnvoyProxy as a sidecar proxy for Consul Connect) in multiple scenarios, such as a proxy server, load balancer, and API Gateway, where our main service is configured through our own xDS service with specific business-logic.

We haven't had any issues with this configuration when using the upstreams Block. We encountered a problem when we tried to switch to Transparent Proxy, which we were looking forward to because Transparent Proxy allows us to manage upstreams dynamically when the configuration changes in our xDS service, without the need to modify the job definition.

ruslan-y commented 1 month ago

As an experiment i started nginx as a reverse proxy. And I found the same problems as on envoy. First I want to provide examples of job for testing:

Test proxy nginx with upstreams

``` job "test-proxy-job" { datacenters = ["dc1"] type = "service" group "test-proxy-group" { count = 1 vault { policies = ["nomad-services"] } update { max_parallel = 1 canary = 1 auto_revert = true auto_promote = true min_healthy_time = "10s" healthy_deadline = "5m" progress_deadline = "15m" } network { mode = "bridge" port "http" { static = 28123 to = 80 } } service { name = "test-proxy" port = "28162" tags = ["proxy", "test"] connect { sidecar_task { resources { cpu = 5000 memory = 3000 } config { image = "envoyproxy/envoy:v1.29.7" args = [ "-c", "${NOMAD_SECRETS_DIR}/envoy_bootstrap.json", "-l", "debug", "--concurrency", "${meta.connect.proxy_concurrency}", "--disable-hot-restart" ] } } sidecar_service { proxy { upstreams { destination_name = "my-service" local_bind_port = 10007 } } } } } task "test-proxy-task" { driver = "docker" resources { cpu = 5000 memory = 5000 } config { image = "nginx:alpine" volumes = [ "local:/etc/nginx/conf.d", ] } template { data = <

Test proxy nginx with transparent proxy

``` job "test-proxy-job" { datacenters = ["dc1"] type = "service" group "test-proxy-group" { count = 1 vault { policies = ["nomad-services"] } update { max_parallel = 1 canary = 1 auto_revert = true auto_promote = true min_healthy_time = "10s" healthy_deadline = "5m" progress_deadline = "15m" } network { mode = "bridge" port "http" { static = 28123 to = 80 } } service { name = "test-proxy" port = "28162" tags = ["proxy", "test"] connect { sidecar_task { resources { cpu = 5000 memory = 3000 } config { image = "envoyproxy/envoy:v1.29.7" args = [ "-c", "${NOMAD_SECRETS_DIR}/envoy_bootstrap.json", "-l", "debug", "--concurrency", "${meta.connect.proxy_concurrency}", "--disable-hot-restart" ] } } sidecar_service { proxy { transparent_proxy {} } } } } task "test-proxy-task" { driver = "docker" resources { cpu = 5000 memory = 5000 } config { image = "nginx:alpine" volumes = [ "local:/etc/nginx/conf.d", ] } template { data = <

Configuration with upstreams works perfect, but with transparent proxy it doesn't work( I tried using the non root image nginxinc/nginx-unprivileged with uid=101 different from the envoy image (uid=0)

# id
uid=101(nginx) gid=101(nginx) groups=101(nginx)

if I make a request from outside, the following logs I see into nginx

[error] 21#21: *3 upstream timed out (110: Operation timed out) while connecting to upstream, client: <client_ip>, server: , request: "GET / HTTP/1.1", upstream: "http://240.0.41.1:80/", host: "<server_ip>:28123"

if I make requests from the nginx container or from the sidecar proxy container there are no problems, as I already wrote above

It doesn't matter which reverse proxy to use and which uid to use in the container, there is a Transparent Proxy issue. At least in our cluster it is like that. You can try to run this job on your cluster.

ruslan-y commented 4 weeks ago

Updated Nomad to version 1.8.3. Also I've updated cni plugins to 1.5.1 version. Problem still exist.

tgross commented 4 weeks ago

Ok, so given the Nginx test you did, it looks like the UID rule may not be the problem. Have you verified that you've configured your Consul as described in https://developer.hashicorp.com/nomad/docs/integrations/consul/service-mesh#prerequisites (which has special requirements for transparent proxy mode)? Can you provide the Consul attributes fingerprinted by the node? (ex. run nomad node status -verbose $nodeid | grep consul)

You can try to run this job on your cluster.

If you want me to run a job in my development cluster, it needs to be a minimal reproduction. Ex. to run this job I have to configure a Vault workload identity, auth method, etc. And you're overriding the sidecar task in ways that don't appear to be related to the problem.

Updated Nomad to version 1.8.3. Also I've updated cni plugins to 1.5.1 version. Problem still exist.

Not sure why you'd think it would be different.

ruslan-y commented 4 weeks ago

Thanks for helping us Tim, we really appreciate it.

Can you provide the Consul attributes fingerprinted by the node?

consul.connect                  = true
consul.datacenter               = dc1
consul.dns.addr                 = 172.17.0.1
consul.dns.port                 = 8600
consul.ft.namespaces            = false
consul.grpc                     = 8502
consul.revision                 = 9f62fb41
consul.server                   = false
consul.sku                      = oss
consul.version                  = 1.19.1
plugins.cni.version.consul-cni  = v1.5.1
unique.consul.name              = <hostname>

If you want me to run a job in my development cluster, it needs to be a minimal reproduction.

This is a minimal reproduction job to run in your development cluster

``` job "test-proxy-job" { datacenters = ["dc1"] type = "service" group "test-proxy-group" { network { mode = "bridge" port "http" { static = "8080" to = "80" } } service { name = "test-proxy" port = "http" connect { sidecar_service { proxy { transparent_proxy {} } } } } task "test-proxy-task" { driver = "docker" config { image = "nginx:alpine" volumes = [ "local:/etc/nginx/conf.d", ] } template { data = <.virtual.consul:80; } server { listen 80; location / { proxy_pass http://backend; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Real-PORT $remote_port; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_http_version 1.1; proxy_set_header Host testproxy; } } EOF destination = "local/config.conf" change_mode = "signal" change_signal = "SIGHUP" } } } } ```

Have you verified that you've configured your Consul as described

Now I'm going to doublecheck it.

ruslan-y commented 4 weeks ago

Hi @tgross !

I double checked, it configured as described in docs. Anyway you can use the nomad job, based on docs for testing this issue

counter-api and nginx

``` job "test-proxy-job" { datacenters = ["dc1"] type = "service" group "test-proxy-group" { network { mode = "bridge" port "http" { static = 28123 to = 80 } } service { name = "test-proxy" port = "http" connect { sidecar_service { proxy { transparent_proxy {} } } } } task "test-proxy-task" { driver = "docker" config { image = "nginx:alpine" volumes = [ "local:/etc/nginx/conf.d", ] } template { data = <

tgross commented 3 weeks ago

Hi @ruslan-y. I've spent most of the day trying to debug this. I've got a reproduction but only with this jobspec and not the example tproxy jobspec from the tutorial, even though I'm using the same cluster for both. But I don't know if what I'm seeing is what you're seeing exactly, because you've also got some DNS configuration issues with that nginx job.

tl;dr there are at least two problems here:

Nginx wasn't resolving the address correctly, probably because the virtual IP wasn't available in Consul instantly on startup for whatever reason. So that required some nginx-specific tweaks.
Once that was fixed, we can only reach the virtual IP address when running as root!

So what I'd like you to do is to follow the same steps I'm going to show below, as that'll give us more of a clue as to what's going on with your setup. I do want to let you know that I'm going to be on vacation for the next couple weeks, but in the meantime I'm going to try to pull some of the Consul folks in on this one because I'm frankly stumped.

I've got three machines here, a Nomad server at 192.168.1.160, and a Nomad client (with Consul agent) at 10.37.105.17. I deploy your example job also add the service intention:

Kind = "service-intentions"
Name = "count-api"
Sources = [
  {
    Name   = "test-proxy"
    Action = "allow"
  }
]

If I curl 10.37.105.17:28123, it hangs until eventually nginx gives up. (I'll get into the error below.) First, I check DNS on the Consul server. You should get a virtual IP address for the service:

dig count-api.virtual.consul @192.168.1.160 -p 8600
...
;; ANSWER SECTION:
count-api.virtual.consul. 0     IN      A       240.0.0.1

If you enter the network namespace of the test-proxy-group's "pause" container, you can confirm whether DNS is working. The way I do that is as follows. First I look for the pause container with the same allocation ID as the nginx container:

$ docker ps
CONTAINER ID   IMAGE                                      COMMAND                  CREATED         STATUS         PORTS     NAMES
5fed1f783c79   nginx:alpine                               "/docker-entrypoint.…"   8 minutes ago   Up 8 minutes             test-proxy-task-710e79de-be0b-3b78-8915-75659477d65a
dff9ce25c8a9   envoyproxy/envoy:v1.28.3                   "/docker-entrypoint.…"   8 minutes ago   Up 8 minutes             connect-proxy-test-proxy-710e79de-be0b-3b78-8915-75659477d65a
6217f4bbd4bb   gcr.io/google_containers/pause-amd64:3.1   "/pause"                 8 minutes ago   Up 8 minutes             nomad_init_710e79de-be0b-3b78-8915-75659477d65a

Then I use nsenter:

ID=6217f4bbd4bb
NET_NS_PATH=$(docker inspect $ID --format '{{.NetworkSettings.SandboxKey}}')
PID=$(docker inspect $ID --format '{{.State.Pid}}')
sudo nsenter --net="$NET_NS_PATH" -t "$PID" $@

Once I'm in the container, I can check whether the issue is DNS or not. First I'll query the Consul server DNS, which works as expected:

# dig count-api.virtual.consul @192.168.1.160 -p 8600
...
;; ANSWER SECTION:
count-api.virtual.consul. 0     IN      A       240.0.0.1

And then I'll query the Consul client agent DNS, which also works as expected:

# dig count-api.virtual.consul @10.37.105.17 -p 8600
...
;; ANSWER SECTION:
count-api.virtual.consul. 0     IN      A       240.0.0.1

Let's check that the IP tables to route DNS traffic are what we'd expect:

# iptables -L -t nat
...
Chain CONSUL_DNS_REDIRECT (2 references)
target     prot opt source               destination
DNAT       udp  --  anywhere             10.37.105.17         udp dpt:domain to:10.37.105.17:8600
DNAT       tcp  --  anywhere             10.37.105.17         tcp dpt:domain to:10.37.105.17:8600

That means I should be able to query that IP for DNS without setting as port on dig, which is also working:

# dig count-api.virtual.consul @10.37.105.17
...
;; ANSWER SECTION:
count-api.virtual.consul. 0     IN      A       240.0.0.1

If I go back out to my host machine and nomad alloc exec into the task, I can see that the /etc/resolv.conf is configured as expected in the task, which means I can curl from inside the task to the virtual IP just as expected (although there's a missing element I didn't notice here, which is that we're running as root... keep going):

$ nomad alloc exec -task test-proxy-task 710e79de /bin/sh

/ # cat /etc/resolv.conf
search multipass
nameserver 10.37.105.17

/ # curl count-api.virtual.consul
{"count":3,"hostname":"86556b23ad63"}

If I look at the allocation logs, I can see we are in fact hitting the proxy, but that the client is cutting the connection from its side once we ctrl-C:

$ nomad alloc logs -task test-proxy-task -f -stdout 710e79de
...
10.37.105.1 - - [15/Aug/2024:14:25:27 +0000] "GET / HTTP/1.1" 499 0 "-" "curl/7.81.0" "-"
10.37.105.1 - - [15/Aug/2024:14:54:34 +0000] "GET / HTTP/1.1" 499 0 "-" "curl/7.81.0" "-"
10.37.105.17 - - [15/Aug/2024:14:54:44 +0000] "GET / HTTP/1.1" 499 0 "-" "curl/7.81.0" "-"
10.37.105.17 - - [15/Aug/2024:14:55:11 +0000] "GET / HTTP/1.1" 499 0 "-" "curl/7.81.0" "-"
10.37.105.17 - - [15/Aug/2024:14:55:29 +0000] "GET / HTTP/1.1" 499 0 "-" "curl/7.81.0" "-"

Then I remembered that Nginx resolves DNS entries on startup by default, so I changed the config to the following:

server {
   listen 80;

   location / {
      resolver {{ env "attr.consul.dns.addr" }} ipv6=off;
      set $domain count-api.virtual.consul;
      proxy_pass http://$domain;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "upgrade";
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Real-PORT $remote_port;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_http_version 1.1;
      proxy_set_header Host testproxy;
   }
}

That uses node attribute interpolation here instead to get the node's IP instead of hard-coding it. The ipv6=off prevents Nginx from getting errors like the following when it tries both IPv4 and IPv6:

2024/08/15 15:14:20 [error] 21#21: unexpected A record in DNS response

and now I get:

2024/08/15 15:20:40 [error] 20#20: *1 upstream timed out (110: Operation timed out) while connecting to upstream, client: 10.37.105.17, server: , request: "GET / HTTP/1.1", upstream: "http://240.0.0.1:80/", host: "10.37.105.17:28123"

But we saw we could reach that IP when we shelled into the container before, so what gives? What if we try to switch to the nginx user and curl:

# curl 240.0.0.1
{"count":6,"hostname":"7c8ee505d498"}

# tac /etc/passwd | sed '1 s~/sbin/nologin~/bin/sh~' | tac > /tmp/passwd
# mv /tmp/passwd /etc/passwd
# su nginx

/ $ nslookup count-api.virtual.consul
Server:         10.37.105.17
Address:        10.37.105.17:53

Name:   count-api.virtual.consul
Address: 240.0.0.1

Name:   count-api.virtual.consul
Address: 240.0.0.1

$ curl 240.0.0.1
^C

Nginx is resolving the DNS correctly, but now that's been fixed we can only reach the virtual IP address when running as root!

Let's check the relevant iptables again:

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
CONSUL_DNS_REDIRECT  udp  --  anywhere             10.37.105.17         udp dpt:domain
CONSUL_DNS_REDIRECT  tcp  --  anywhere             10.37.105.17         tcp dpt:domain
CONSUL_PROXY_OUTPUT  tcp  --  anywhere             anywhere

Chain CONSUL_DNS_REDIRECT (2 references)
target     prot opt source               destination
DNAT       udp  --  anywhere             10.37.105.17         udp dpt:domain to:10.37.105.17:8600
DNAT       tcp  --  anywhere             10.37.105.17         tcp dpt:domain to:10.37.105.17:8600

Chain CONSUL_PROXY_OUTPUT (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere             owner UID match systemd-resolve
RETURN     all  --  anywhere             localhost
CONSUL_PROXY_REDIRECT  all  --  anywhere             anywhere

Chain CONSUL_PROXY_REDIRECT (1 references)
target     prot opt source               destination
REDIRECT   tcp  --  anywhere             anywhere             redir ports 15001

Traffic for 10.37.10.17 gets redirected to Consul DNS, otherwise fall through the proxy output. If the UID doesn't match systemd-resolve (i.e. UID 101 of our Envoy proxy) and doesn't go to localhost, redirect the ports to 15001 which is our Envoy proxy. We can confirm Envoy is listening on 15001 on localhost from inside the network namespace:

# netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.2:19001         0.0.0.0:*               LISTEN      19535/envoy
tcp        0      0 0.0.0.0:27436           0.0.0.0:*               LISTEN      19535/envoy
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      19689/nginx: master
tcp        0      0 127.0.0.1:15001         0.0.0.0:*               LISTEN      19535/envoy

The next thing I checked was whether this was broken with the example tproxy job, and it works fine there!

jobspec

```hcl job "countdash" { group "dashboard" { network { mode = "bridge" port "http" { static = 9010 to = 9002 } } service { name = "count-dashboard" port = "9002" connect { sidecar_service { proxy { transparent_proxy {} } } } } task "dashboard" { driver = "docker" env { COUNTING_SERVICE_URL = "http://count-api.virtual.consul" } config { image = "hashicorpdev/counter-dashboard:v3" auth_soft_fail = true } } } group "api" { network { mode = "bridge" } service { name = "count-api" port = "9001" connect { sidecar_service { proxy { transparent_proxy {} } } } } task "web" { driver = "docker" config { image = "hashicorpdev/counter-api:v3" auth_soft_fail = true } } } } ```

Then I realized the counter-dashboard application is running as root. So I did nomad alloc exec into that container, modified the /etc/passwd so I could switch to an unprivileged user, and it works fine there:

$ whoami
nobody
$ busybox wget -O- -S count-api.virtual.consul
Connecting to count-api.virtual.consul (240.0.0.1:80)
  HTTP/1.1 200 OK
  Date: Thu, 15 Aug 2024 17:47:15 GMT
  Content-Length: 37
  Content-Type: text/plain; charset=utf-8
  Connection: close

If I compare the iptables rules between the two, they're identical except for the specific ports. On a hunch I tried changing the nginx port to an "unprivileged" port and that doesn't make a difference either.

So again, as I said at the top, I'm stumped. But there are a lot of moving parts here, so go thru the debugging I've done here and we'll see if you've got the same issue or whether you're running into something else.

ruslan-y commented 3 weeks ago

@tgross Thank you for being so deeply involved in our issue.

I've got a reproduction but only with this jobspec and not the example tproxy jobspec from the tutorial

Yes, that's exactly right. The jobspec from the tutorial works fine. It stucks with any jobspec reverse proxy I've tried, be it nginx or envoy. And now I've finally figured out why and how it's being fixed. It was solved by adding a block of variable to the main envoy proxy task:

      env {
        ENVOY_UID=0
      }

As stated in the envoy's docs

The default uid and gid for the envoy user are 101

Thus changing the uid from which the envoy process runs (we're right back where we started😁)

For nginx I think there will be a similar solution. You must have different uids into sidacar proxy and into main task. The only thing that remains unclear is why the parameter uid = 0 (or uid = 1001 for ex) didn't work for transparent_proxy.

So I think you should add information about it in your tutorial Transparent Proxy tutorial to make it easier and clarify. Tim thank you so much for your help. 🙏

hashicorp / nomad