Consul Connect: Fails to register routes for Envoy?

mmisztal1980 commented 6 years ago

Overview of the Issue

I've created a PoC environment for 2x .NET Core 2.1 Services communicating via Consul-Connect. The entire setup relies on a consul server instance, which uses a services.json file to perform the registrations in a 'static' way. If I understand the process correctly, the sidecar proxies should retrieve their configuration from the consul-server after starting up.

Once the consul server container is healthy, 2x sidecar-proxies start. At this point the entire setup is healthy:

When attempting to have core-demo-api1 call core-demo-api1 , I'm getting a 404 response. I've exposed core-demo-api1-sidecar-proxy 's port 19000 and obtained a config dump in which I do not see any routes defined to core-demo-api2, which I believe is the root cause for the communication issue between the 2 services. I believe I've followed the available documentation to the letter, so my situation can be a potential bug.

{
    "configs": [
        {
            "@type": "type.googleapis.com/envoy.admin.v2alpha.BootstrapConfigDump",
            "bootstrap": {
                "node": {
                    "id": "core-demo-api1-sidecar-proxy",
                    "cluster": "core-demo-api1-sidecar-proxy",
                    "build_version": "5d25f466c3410c0dfa735d7d4358beb76b2da507/1.8.0/Clean/RELEASE"
                },
                "static_resources": {
                    "clusters": [
                        {
                            "name": "local_agent",
                            "connect_timeout": "1s",
                            "hosts": [
                                {
                                    "socket_address": {
                                        "address": "127.0.0.1",
                                        "port_value": 8502
                                    }
                                }
                            ],
                            "http2_protocol_options": {}
                        }
                    ]
                },
                "dynamic_resources": {
                    "lds_config": {
                        "ads": {}
                    },
                    "cds_config": {
                        "ads": {}
                    },
                    "ads_config": {
                        "api_type": "GRPC",
                        "grpc_services": [
                            {
                                "envoy_grpc": {
                                    "cluster_name": "local_agent"
                                },
                                "initial_metadata": [
                                    {
                                        "key": "x-consul-token"
                                    }
                                ]
                            }
                        ]
                    }
                },
                "admin": {
                    "access_log_path": "/dev/null",
                    "address": {
                        "socket_address": {
                            "address": "0.0.0.0",
                            "port_value": 19000
                        }
                    }
                }
            },
            "last_updated": "2018-10-29T09:22:19.322Z"
        },
        {
            "@type": "type.googleapis.com/envoy.admin.v2alpha.ClustersConfigDump",
            "static_clusters": [
                {
                    "cluster": {
                        "name": "local_agent",
                        "connect_timeout": "1s",
                        "hosts": [
                            {
                                "socket_address": {
                                    "address": "127.0.0.1",
                                    "port_value": 8502
                                }
                            }
                        ],
                        "http2_protocol_options": {}
                    },
                    "last_updated": "2018-10-29T09:22:19.329Z"
                }
            ]
        },
        {
            "@type": "type.googleapis.com/envoy.admin.v2alpha.ListenersConfigDump"
        }
    ]
}

Reproduction Steps

consul-envoy/Dockerfile

FROM consul:1.3.0
FROM envoyproxy/envoy:v1.8.0
COPY --from=0 /bin/consul /bin/consul
ENTRYPOINT ["dumb-init", "consul", "connect", "envoy"]

services.json

{
    "connect": {
        "enabled": true
    },
    "services": [
        {
            "name": "core-demo-api1",
            "port": 80,
            "connect": {
                "sidecar_service": {
                    "port": 19000,
                    "proxy": {                        
                        "upstreams": [
                            {
                                "destination_name": "code-demo-api2",
                                "local_bind_port": 80
                            }
                        ]
                    },
                    "checks": [
                        {
                            "name": "Connect Sidecar Listening",
                            "tcp": "core-demo-api1-sidecar-proxy:19000",
                            "interval": "10s"
                        },
                        {
                            "name": "Connect Sidecar Aliasing core-demo-api1",
                            "alias_service": "core-demo-api1"
                        }
                    ]
                }
            }
        },
        {
            "name": "core-demo-api2",
            "port": 80,
            "connect": {
                "sidecar_service": {
                    "port": 19000,
                    "proxy": {                        
                        "upstreams": [
                            {
                                "destination_name": "code-demo-api1",
                                "local_bind_port": 80
                            }
                        ]
                    },
                    "checks": [
                        {
                            "name": "Connect Sidecar Listening",
                            "tcp": "core-demo-api2-sidecar-proxy:19000",
                            "interval": "10s"
                        },
                        {
                            "name": "Connect Sidecar Aliasing core-demo-api2",
                            "alias_service": "core-demo-api2"
                        }
                    ]
                }
            }
        }
    ]
}

docker-compose.yml

version: '2.4'

services:
  consul:
    image: consul:1.3.0
    command: "agent -dev -server -bootstrap-expect 1 -ui -client 0.0.0.0 -datacenter local -node docker -config-file /etc/consul/services.json"
    volumes:
      - ./demo/consul/services.json:/etc/consul/services.json
  core-demo-api1:
    image: core-demo-api1
    build:
      context: .
      dockerfile: demo/Core.Demo.Api1/Dockerfile
  core-demo-api1-sidecar-proxy:
    image: consul-envoy
    command: "-sidecar-for core-demo-api1 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500"
    build:
      context: ./docker/consul-envoy
      dockerfile: ./Dockerfile
    depends_on: 
      consul:
        condition: service_healthy

  core-demo-api2:
    image: core-demo-api2
    build:
      context: .
      dockerfile: demo/Core.Demo.Api2/Dockerfile
  core-demo-api2-sidecar-proxy:
    image: consul-envoy
    command: "-sidecar-for core-demo-api2 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500"
    build:
      context: ./docker/consul-envoy
      dockerfile: ./Dockerfile
    depends_on: 
      consul:
        condition: service_healthy

docker-compose.override.yml

version: '2.4'

services:
  consul:
    ports:
      - "8400:8400"
      - "8500:8500"
      - "8600:8600"
      - "8600:8600/udp"
    healthcheck:
      test: ["CMD-SHELL", "curl --silent --fail localhost:8500/v1/agent/services || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  core-demo-api1:
    environment:
      - ASPNETCORE_ENVIRONMENT=Compose
      - ASPNETCORE_URLS=http://+:80
      - HOSTNAME=core-demo-api1
    ports:
      - "55127:80"
  core-demo-api1-sidecar-proxy:
    ports:
      - "19000:19000"
  core-demo-api2:
    environment:
      - ASPNETCORE_ENVIRONMENT=Compose
      - ASPNETCORE_URLS=http://+:80
      - HOSTNAME=core-demo-api2
    ports:
      - "55185:80"

Consul info for both Client and Server

Client info

**(!) Executed inside one of the sidecar-proxy containers (!)** ``` # consul info Error querying agent: Get http://127.0.0.1:8500/v1/agent/self: dial tcp 127.0.0.1:8500: connect: connection refused ```

envoy.yaml

```yaml admin: access_log_path: /tmp/admin_access.log address: socket_address: protocol: TCP address: 127.0.0.1 port_value: 9901 static_resources: listeners: - name: listener_0 address: socket_address: protocol: TCP address: 0.0.0.0 port_value: 10000 filter_chains: - filters: - name: envoy.http_connection_manager config: stat_prefix: ingress_http route_config: name: local_route virtual_hosts: - name: local_service domains: ["*"] routes: - match: prefix: "/" route: host_rewrite: www.google.com cluster: service_google http_filters: - name: envoy.router clusters: - name: service_google connect_timeout: 0.25s type: LOGICAL_DNS # Comment out the following line to test on v6 networks dns_lookup_family: V4_ONLY lb_policy: ROUND_ROBIN hosts: - socket_address: address: google.com port_value: 443 tls_context: { sni: www.google.com } ```

Server info

``` # consul info agent: check_monitors = 0 check_ttls = 0 checks = 4 services = 4 build: prerelease = revision = e8757838 version = 1.3.0 consul: bootstrap = true known_datacenters = 1 leader = true leader_addr = 127.0.0.1:8300 server = true raft: applied_index = 1439 commit_index = 1439 fsm_pending = 0 last_contact = 0 last_log_index = 1439 last_log_term = 2 last_snapshot_index = 0 last_snapshot_term = 0 latest_configuration = [{Suffrage:Voter ID:7df0a6a8-3f7a-10f6-d759-e46d1b024aa2 Address:127.0.0.1:8300}] latest_configuration_index = 1 num_peers = 0 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Leader term = 2 runtime: arch = amd64 cpu_count = 2 goroutines = 103 max_procs = 2 os = linux version = go1.11.1 serf_lan: coordinate_resets = 0 encrypted = false event_queue = 1 event_time = 2 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 1 members = 1 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 1 members = 1 query_queue = 0 query_time = 1 ```

Operating system and Environment details

0 Windows 10 Enterprise x64 running Docker for Windows (MobyLinuxVM)

Log Fragments

TBD Include appropriate Client or Server log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log instead of posting it in the issue. Use -log-level=TRACE on the client and server to capture the maximum log detail.

rboyer commented 6 years ago

The getting started guide has the proxies and the consul agent run within the same network namespace with the docker flag --network host.

In your example your sidecar proxies and consul agent do not share a network namespace, so the default gRPC server address will need to be explicitly configured like you did with the HTTP api address. The gRPC server serves the xDS protocol for envoy.

In your example you would need to specify -grpc-addr consul:8502 in addition to -http-addr http://consul:8500 like:

-sidecar-for core-demo-api2 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500 -grpc-addr consul:8502

I picked up on the misconfig because I saw that the .configs[0].bootstrap.static_resources.clusters[0].hosts[0].socket_address value in your envoy admin config dump was set to { "address": "127.0.0.1", "port_value": 8502 } instead of the ip address of the consul agent.

mmisztal1980 commented 6 years ago

Hi @rboyer , thanks for the hint, that seems to have helped.

stdout

``` docker-compose up Starting core-pkg_core-demo-api1_1 ... done Starting core-pkg_core-demo-api2_1 ... done Starting core-pkg_consul_1 ... done Creating core-pkg_core-demo-api2-sidecar-proxy_1 ... done Creating core-pkg_core-demo-api1-sidecar-proxy_1 ... done Attaching to core-pkg_core-demo-api2_1, core-pkg_core-demo-api1_1, core-pkg_consul_1, core-pkg_core-demo-api2-sidecar-proxy_1, core-pkg_core-demo-api1-sidecar-proxy_1 core-demo-api2_1 | Hosting environment: Compose core-demo-api2_1 | Content root path: /app core-demo-api2_1 | Now listening on: http://[::]:80 core-demo-api2_1 | Application started. Press Ctrl+C to shut down. core-demo-api1_1 | info: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[0] core-demo-api1_1 | User profile is available. Using '/root/.aspnet/DataProtection-Keys' as key repository; keys will not be encrypted at rest. core-demo-api1_1 | Hosting environment: Compose core-demo-api1_1 | Content root path: /app core-demo-api1_1 | Now listening on: http://[::]:80 core-demo-api1_1 | Application started. Press Ctrl+C to shut down. consul_1 | BootstrapExpect is set to 1; this is the same as Bootstrap mode. consul_1 | bootstrap = true: do not enable unless necessary consul_1 | ==> Starting Consul agent... consul_1 | ==> Consul agent running! consul_1 | Version: 'v1.3.0' consul_1 | Node ID: 'f06b40fe-308f-6b01-9f1a-82c46089417a' consul_1 | Node name: 'docker' consul_1 | Datacenter: 'local' (Segment: '') consul_1 | Server: true (Bootstrap: true) consul_1 | Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600) consul_1 | Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302) consul_1 | Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false consul_1 | consul_1 | ==> Log data will now stream in as it occurs: consul_1 | consul_1 | 2018/10/29 20:34:09 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:f06b40fe-308f-6b01-9f1a-82c46089417a Address:127.0.0.1:8300}] consul_1 | 2018/10/29 20:34:09 [INFO] raft: Node at 127.0.0.1:8300 [Follower] entering Follower state (Leader: "") consul_1 | 2018/10/29 20:34:09 [INFO] serf: EventMemberJoin: docker.local 127.0.0.1 consul_1 | 2018/10/29 20:34:09 [INFO] serf: EventMemberJoin: docker 127.0.0.1 consul_1 | 2018/10/29 20:34:09 [INFO] consul: Handled member-join event for server "docker.local" in area "wan" consul_1 | 2018/10/29 20:34:09 [INFO] consul: Adding LAN server docker (Addr: tcp/127.0.0.1:8300) (DC: local) consul_1 | 2018/10/29 20:34:09 [DEBUG] agent/proxy: managed Connect proxy manager started consul_1 | 2018/10/29 20:34:09 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp) consul_1 | 2018/10/29 20:34:09 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp) consul_1 | 2018/10/29 20:34:09 [INFO] agent: Started HTTP server on [::]:8500 (tcp) consul_1 | 2018/10/29 20:34:09 [INFO] agent: started state syncer consul_1 | 2018/10/29 20:34:09 [INFO] agent: Started gRPC server on [::]:8502 (tcp) consul_1 | 2018/10/29 20:34:09 [WARN] raft: Heartbeat timeout from "" reached, starting election consul_1 | 2018/10/29 20:34:09 [INFO] raft: Node at 127.0.0.1:8300 [Candidate] entering Candidate state in term 2 consul_1 | 2018/10/29 20:34:09 [DEBUG] raft: Votes needed: 1 consul_1 | 2018/10/29 20:34:09 [DEBUG] raft: Vote granted from f06b40fe-308f-6b01-9f1a-82c46089417a in term 2. Tally: 1 consul_1 | 2018/10/29 20:34:09 [INFO] raft: Election won. Tally: 1 consul_1 | 2018/10/29 20:34:09 [INFO] raft: Node at 127.0.0.1:8300 [Leader] entering Leader state consul_1 | 2018/10/29 20:34:09 [INFO] consul: cluster leadership acquired consul_1 | 2018/10/29 20:34:09 [INFO] consul: New leader elected: docker consul_1 | 2018/10/29 20:34:10 [INFO] connect: initialized CA with provider "consul" consul_1 | 2018/10/29 20:34:10 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/10/29 20:34:10 [INFO] consul: member 'docker' joined, marking health alive consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically consul_1 | 2018/10/29 20:34:10 [INFO] agent: Synced service "core-demo-api1-sidecar-proxy" consul_1 | 2018/10/29 20:34:10 [INFO] agent: Synced service "core-demo-api2" consul_1 | 2018/10/29 20:34:10 [INFO] agent: Synced service "core-demo-api2-sidecar-proxy" consul_1 | 2018/10/29 20:34:10 [INFO] agent: Synced service "core-demo-api1" consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Node info in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Node info in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" in sync consul_1 | 2018/10/29 20:34:10 [DEBUG] agent: Node info in sync consul_1 | 2018/10/29 20:34:11 [WARN] agent: Check "service:core-demo-api2-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api2-sidecar-proxy on 127.0.0.11:53: no such host consul_1 | 2018/10/29 20:34:14 [WARN] agent: Check "service:core-demo-api1-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api1-sidecar-proxy on 127.0.0.11:53: no such host consul_1 | 2018/10/29 20:34:21 [WARN] agent: Check "service:core-demo-api2-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api2-sidecar-proxy on 127.0.0.11:53: no such host consul_1 | 2018/10/29 20:34:24 [WARN] agent: Check "service:core-demo-api1-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api1-sidecar-proxy on 127.0.0.11:53: no such host consul_1 | 2018/10/29 20:34:31 [DEBUG] http: Request GET /v1/catalog/datacenters (337.4µs) from=172.21.0.1:46480 consul_1 | 2018/10/29 20:34:31 [DEBUG] http: Request GET /v1/catalog/datacenters (372.7µs) from=172.21.0.1:46480 consul_1 | 2018/10/29 20:34:31 [DEBUG] http: Request GET /v1/catalog/datacenters (184.2µs) from=172.21.0.1:46480 consul_1 | 2018/10/29 20:34:31 [DEBUG] http: Request GET /v1/internal/ui/services?dc=local (387.5µs) from=172.21.0.1:46480 consul_1 | 2018/10/29 20:34:31 [WARN] agent: Check "service:core-demo-api2-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api2-sidecar-proxy on 127.0.0.11:53: no such host consul_1 | 2018/10/29 20:34:34 [WARN] agent: Check "service:core-demo-api1-sidecar-proxy:1" socket connection failed: dial tcp: lookup core-demo-api1-sidecar-proxy on 127.0.0.11:53: no such host consul_1 | 2018/10/29 20:34:39 [DEBUG] http: Request GET /v1/agent/services (273.6µs) from=127.0.0.1:34188 consul_1 | 2018/10/29 20:34:41 [DEBUG] http: Request GET /v1/agent/services (169.2µs) from=172.21.0.5:50354 consul_1 | 2018/10/29 20:34:41 [DEBUG] http: Request GET /v1/agent/services (203.6µs) from=172.21.0.6:39096 core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.401][6][info][main] source/server/server.cc:202] initializing epoch 0 (hot restart version=disabled) core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:204] statically linked extensions: core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:206] access_loggers: envoy.file_access_log,envoy.http_grpc_access_log core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:209] filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:212] filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:215] filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:217] stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:219] tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:222] transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.402][6][info][main] source/server/server.cc:225] transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.406][6][info][main] source/server/server.cc:264] admin address: 0.0.0.0:19000 core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.406][6][info][config] source/server/configuration_impl.cc:50] loading 0 static secret(s) core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.406][6][info][config] source/server/configuration_impl.cc:56] loading 1 cluster(s) core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:202] initializing epoch 0 (hot restart version=disabled) core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:204] statically linked extensions: core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:206] access_loggers: envoy.file_access_log,envoy.http_grpc_access_log core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:209] filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:212] filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:215] filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:217] stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:219] tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:222] transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.416][6][info][main] source/server/server.cc:225] transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.419][6][info][main] source/server/server.cc:264] admin address: 0.0.0.0:19000 core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.420][6][info][config] source/server/configuration_impl.cc:50] loading 0 static secret(s) core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.420][6][info][config] source/server/configuration_impl.cc:56] loading 1 cluster(s) core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.412][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:131] cm init: initializing cds core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.413][6][info][config] source/server/configuration_impl.cc:61] loading 0 listener(s) core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.413][6][info][config] source/server/configuration_impl.cc:95] loading tracing configuration core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.413][6][info][config] source/server/configuration_impl.cc:113] loading stats sink configuration core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.414][6][info][main] source/server/server.cc:452] starting main dispatch loop core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.423][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:131] cm init: initializing cds core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.426][6][info][config] source/server/configuration_impl.cc:61] loading 0 listener(s) core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.426][6][info][config] source/server/configuration_impl.cc:95] loading tracing configuration core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.426][6][info][config] source/server/configuration_impl.cc:113] loading stats sink configuration core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.426][6][info][main] source/server/server.cc:452] starting main dispatch loop core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.430][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster local_app during init core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.435][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster local_app during init core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.436][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster service:code-demo-api1 during init core-demo-api2-sidecar-proxy_1 | [2018-10-29 20:34:41.436][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:111] cm init: initializing secondary clusters core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.437][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster service:code-demo-api2 during init core-demo-api1-sidecar-proxy_1 | [2018-10-29 20:34:41.437][6][info][upstream] source/common/upstream/cluster_manager_impl.cc:111] cm init: initializing secondary clusters consul_1 | 2018/10/29 20:34:41 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:34:41 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/10/29 20:34:41 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/10/29 20:34:41 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/10/29 20:34:41 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/10/29 20:34:41 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" in sync consul_1 | 2018/10/29 20:34:41 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:34:41 [INFO] agent: Synced check "service:core-demo-api2-sidecar-proxy:1" consul_1 | 2018/10/29 20:34:41 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:34:41 [DEBUG] agent: Node info in sync consul_1 | 2018/10/29 20:34:44 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:34:44 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/10/29 20:34:44 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/10/29 20:34:44 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/10/29 20:34:44 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/10/29 20:34:44 [INFO] agent: Synced check "service:core-demo-api1-sidecar-proxy:1" consul_1 | 2018/10/29 20:34:44 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:34:44 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" in sync consul_1 | 2018/10/29 20:34:44 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:34:44 [DEBUG] agent: Node info in sync consul_1 | 2018/10/29 20:34:45 [DEBUG] http: Request GET /v1/internal/ui/nodes?dc=local (465.7µs) from=172.21.0.1:46480 consul_1 | 2018/10/29 20:34:46 [DEBUG] http: Request GET /v1/internal/ui/services?dc=local (152.8µs) from=172.21.0.1:46480 consul_1 | 2018/10/29 20:34:51 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:34:54 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:01 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:04 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:09 [DEBUG] http: Request GET /v1/agent/services (95.5µs) from=127.0.0.1:34296 consul_1 | 2018/10/29 20:35:09 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/10/29 20:35:11 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:14 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:21 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:24 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:31 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:34 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:39 [DEBUG] http: Request GET /v1/agent/services (147.6µs) from=127.0.0.1:34392 consul_1 | 2018/10/29 20:35:41 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:44 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:51 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:35:54 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:36:01 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:36:04 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" is passing consul_1 | 2018/10/29 20:36:09 [DEBUG] http: Request GET /v1/agent/services (128.1µs) from=127.0.0.1:34486 consul_1 | 2018/10/29 20:36:09 [DEBUG] manager: Rebalanced 1 servers, next active server is docker.local (Addr: tcp/127.0.0.1:8300) (DC: local) consul_1 | 2018/10/29 20:36:09 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/10/29 20:36:10 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically consul_1 | 2018/10/29 20:36:10 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/10/29 20:36:10 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/10/29 20:36:10 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/10/29 20:36:10 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/10/29 20:36:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:1" in sync consul_1 | 2018/10/29 20:36:10 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:36:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:1" in sync consul_1 | 2018/10/29 20:36:10 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy:2" in sync consul_1 | 2018/10/29 20:36:10 [DEBUG] agent: Node info in sync ```

Can you elaborate on that behavior? Why is the gRPC setting required when not specifying a docker network? I seem to have missed that bit in the docs or it's not well described

rboyer commented 6 years ago

In a non-containerized setup a consul agent (running in lightweight client mode) and applications using that consul agent communicate over a localhost connection using the HTTP api. This avoids the problem of discovering your service discovery system by assuming you can simply communicate over a prearranged port (:8500).

As of v1.3, consul also listens on a gRPC port (defaults to :8502) when Connect is enabled. This speaks the xDS protocol specifically for use by the envoy instances. Envoy itself knows nothing of consul, so it cannot use the consul HTTP api.

The consul connect envoy subcommand briefly uses the consul HTTP api on startup before exec-ing the envoy binary which then strictly communicates with the running consul agent using gRPC from there on out speaking only xDS. So direct access to both ports is necessary for the sidecar to be setup and run.

The getting started guide uses host networking for simplicity so that the consul agent and all of the envoy instances are free to communicate with each other directly over localhost.

If these are not co-located then the defaults cannot be used and will have to be explicitly configured.

mmisztal1980 commented 6 years ago

Ok, this makes a ton of sense now.

I've obtained another config dump, the proxy for core-demo-api1 seems to be aware of core-demo-api2 now:

config_dump

```json { "configs": [ { "@type": "type.googleapis.com/envoy.admin.v2alpha.BootstrapConfigDump", "bootstrap": { "node": { "id": "core-demo-api1-sidecar-proxy", "cluster": "core-demo-api1-sidecar-proxy", "build_version": "5d25f466c3410c0dfa735d7d4358beb76b2da507/1.8.0/Clean/RELEASE" }, "static_resources": { "clusters": [ { "name": "local_agent", "connect_timeout": "1s", "hosts": [ { "socket_address": { "address": "172.18.0.2", "port_value": 8502 } } ], "http2_protocol_options": {} } ] }, "dynamic_resources": { "lds_config": { "ads": {} }, "cds_config": { "ads": {} }, "ads_config": { "api_type": "GRPC", "grpc_services": [ { "envoy_grpc": { "cluster_name": "local_agent" }, "initial_metadata": [ { "key": "x-consul-token" } ] } ] } }, "admin": { "access_log_path": "/dev/null", "address": { "socket_address": { "address": "0.0.0.0", "port_value": 19000 } } } }, "last_updated": "2018-10-29T20:43:11.078Z" }, { "@type": "type.googleapis.com/envoy.admin.v2alpha.ClustersConfigDump", "version_info": "00000001", "static_clusters": [ { "cluster": { "name": "local_agent", "connect_timeout": "1s", "hosts": [ { "socket_address": { "address": "172.18.0.2", "port_value": 8502 } } ], "http2_protocol_options": {} }, "last_updated": "2018-10-29T20:43:11.080Z" } ], "dynamic_active_clusters": [ { "version_info": "00000001", "cluster": { "name": "local_app", "connect_timeout": "5s", "hosts": [ { "socket_address": { "address": "127.0.0.1", "port_value": 80 } } ] }, "last_updated": "2018-10-29T20:43:11.083Z" }, { "version_info": "00000001", "cluster": { "name": "service:core-demo-api2", "type": "EDS", "eds_cluster_config": { "eds_config": { "ads": {} } }, "connect_timeout": "5s", "tls_context": { "common_tls_context": { "tls_params": {}, "tls_certificates": [ { "certificate_chain": { "inline_string": "-----BEGIN CERTIFICATE-----\nMIICoDCCAkWgAwIBAgIBDzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\nQ0EgNzAeFw0xODEwMjkyMDQxMzlaFw0xODExMDEyMDQxMzlaMBkxFzAVBgNVBAMT\nDmNvcmUtZGVtby1hcGkxMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEddEmcU3J\n93aVNkjVBacdp4YsfWVQQbDBAV2Vp6+x91L9ZenMfNYlka0PjMUnPlTLsXA+RygP\njU0KEezoMRzK96OCAX8wggF7MA4GA1UdDwEB/wQEAwIDuDAdBgNVHSUEFjAUBggr\nBgEFBQcDAgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADBoBgNVHQ4EYQRfYWY6NDE6\nNWU6Yzc6NDA6Yzc6OGU6MGE6MTk6MTk6MDI6ZDI6NDM6NDk6YjE6MDA6ZTc6YjA6\nN2E6MGM6MTQ6ZmQ6Mzg6MDU6ZjA6ZjQ6MTI6ODI6OTg6OTA6Y2Q6MTQwagYDVR0j\nBGMwYYBfYWY6NDE6NWU6Yzc6NDA6Yzc6OGU6MGE6MTk6MTk6MDI6ZDI6NDM6NDk6\nYjE6MDA6ZTc6YjA6N2E6MGM6MTQ6ZmQ6Mzg6MDU6ZjA6ZjQ6MTI6ODI6OTg6OTA6\nY2Q6MTQwZgYDVR0RBF8wXYZbc3BpZmZlOi8vOTg2YTI2YTAtMmY1ZS1jNTc2LWJh\nNTMtNmJhYWUyOTM1OGQ1LmNvbnN1bC9ucy9kZWZhdWx0L2RjL2xvY2FsL3N2Yy9j\nb3JlLWRlbW8tYXBpMTAKBggqhkjOPQQDAgNJADBGAiEA9ogH7GSXunQknJsqPeW3\n8yAVgSeifpcgj1x4CefQ9b4CIQDFUOZNrknZHdP5XtpnUI12mojZFOfOKZZTLU03\n/+7yEg==\n-----END CERTIFICATE-----\n" }, "private_key": { "inline_string": "-----BEGIN EC PRIVATE KEY-----\nMHcCAQEEICLItQyoIMOvBuXlFnPy7NHdOyCZvKyvs9XwoMLyVJ+GoAoGCCqGSM49\nAwEHoUQDQgAEddEmcU3J93aVNkjVBacdp4YsfWVQQbDBAV2Vp6+x91L9ZenMfNYl\nka0PjMUnPlTLsXA+RygPjU0KEezoMRzK9w==\n-----END EC PRIVATE KEY-----\n" } } ], "validation_context": { "trusted_ca": { "inline_string": "-----BEGIN CERTIFICATE-----\nMIICWTCCAf+gAwIBAgIBBzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\nQ0EgNzAeFw0xODEwMjkyMDQyMzlaFw0yODEwMjYyMDQyMzlaMBYxFDASBgNVBAMT\nC0NvbnN1bCBDQSA3MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAER2QBoP+HINzV\nJQn/rgRFVBGTgSrX9+qd9lK9U6y+dDvpT5PbPyO7kZSU0GCmAT/NjGoqIGtgW9pp\nHuOF8CxKgKOCATwwggE4MA4GA1UdDwEB/wQEAwIBhjAPBgNVHRMBAf8EBTADAQH/\nMGgGA1UdDgRhBF9hZjo0MTo1ZTpjNzo0MDpjNzo4ZTowYToxOToxOTowMjpkMjo0\nMzo0OTpiMTowMDplNzpiMDo3YTowYzoxNDpmZDozODowNTpmMDpmNDoxMjo4Mjo5\nODo5MDpjZDoxNDBqBgNVHSMEYzBhgF9hZjo0MTo1ZTpjNzo0MDpjNzo4ZTowYTox\nOToxOTowMjpkMjo0Mzo0OTpiMTowMDplNzpiMDo3YTowYzoxNDpmZDozODowNTpm\nMDpmNDoxMjo4Mjo5ODo5MDpjZDoxNDA/BgNVHREEODA2hjRzcGlmZmU6Ly85ODZh\nMjZhMC0yZjVlLWM1NzYtYmE1My02YmFhZTI5MzU4ZDUuY29uc3VsMAoGCCqGSM49\nBAMCA0gAMEUCIAE01fG9L9G90KtEfZIzVFrCrSEYnysQp5lZiyugWfBHAiEAh4CS\njy2F8MAE0gHy1WT4tLWV5PzxK7Wx4uboMbsST5Y=\n-----END CERTIFICATE-----\n" } } } } }, "last_updated": "2018-10-29T20:43:11.084Z" } ] }, { "@type": "type.googleapis.com/envoy.admin.v2alpha.ListenersConfigDump" } ] } ```

I'm still getting 404s when calling core-demo-api2 from core-demo-api1 through the core-demo-api1-sidecar-proxy though.

Should I use:

http://core-demo-api1-sidecar-proxy:19000/core-demo-api2/api/values or
http://core-demo-api1-sidecar-proxy:19000/api/values while setting the host header to core-demo-api2 ? or something completely different?

banks commented 6 years ago

@mmisztal1980 your client apps should talk to whichever local port the upstream is listening on - unless you setup something I missed hostnames like core-demo-api1-sidecar-proxy aren't available.

For example your service definition for core-demo-api1 has the following upstream definition:

                            {
                                "destination_name": "code-demo-api2",
                                "local_bind_port": 80
                            }

This is saying "Please configure the proxy to listen on localhost:80 and proxy those connections to the code-demo-api2 service".

So the proxy will try to do just that but will (probably) collide since your actual service is already listening on port 80.

The idea here is that you pick some port for each upstream that you want to expose the service on over loopback. The port number is arbitrary and the only thing that cares about it is your application.

For example if you changed that definition to:

                            {
                                "destination_name": "code-demo-api2",
                                "local_bind_port": 8080
                            }

Then the proxy would listen on localhost:8080 (technically 127.0.0.1:8080) and you app would be able to connect just using http://127.0.0.1:8080/api/values.

Note that this is only layer 4 (TCP/TLS) proxying so there are no HTTP paths or routes in the mix. L7 support will come later.

Hope you get that working!

mmisztal1980 commented 6 years ago

@banks Thanks for clearing that up, now I understand the concept.

I've reconfigured the core-demo-api1 registration:

core-demo-api1 registration (proxy port for core-demo-api2 changed to 19100)

```json { "name": "core-demo-api1", "port": 80, "connect": { "sidecar_service": { "port": 19000, "proxy": { "upstreams": [ { "destination_name": "code-demo-api2", "local_bind_port": 19100 } ] }, "checks": [ { "name": "Connect Sidecar Listening", "tcp": "core-demo-api1-sidecar-proxy:19000", "interval": "10s" }, { "name": "Connect Sidecar Aliasing core-demo-api1", "alias_service": "core-demo-api1" } ] } } } ```

I've tried to execute a call vs my service's proxy, my client keeps throwing exceptions:

I've signed onto the proxy container to examine whether or not, the port is being listened on:

 netstat -ano
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       Timer
tcp        0      0 127.0.0.11:32977        0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 0.0.0.0:19000           0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 172.19.0.5:49484        91.189.88.161:80        TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 172.19.0.5:44220        172.19.0.3:8502         ESTABLISHED off (0.00/0/0)
udp        0      0 127.0.0.11:55487        0.0.0.0:*                           off (0.00/0/0)
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node   Path

I do not see port `19100` being listed :/

BTW, you've mentioned that local_bind_port will listen on the localhost interface. I'd prefer 0.0.0.0 in my scenario (multiple docker containers not running in a k8s pod (yet)) - how can I achieve that?

mmisztal1980 commented 6 years ago

I've just re-read the paragraph on Sidecar Service Registration.

The upstreams use local_bind_port, while in the Sidecar Service Defaults there's no mention of that parameter, but there is something else listed: local_service_port under proxy.

Is one transformed into the other? I think the docs are somewhat inconsistent here and don't really offer guidance on how to solve my above question :/

banks commented 6 years ago

@mmisztal1980 can you clarify how you are running this again?

Port 19000 happens to be the port we chose for the Envoy admin API (which you can't just disable) so iuf you are using Envoy here then it's not a great choice for your own stuff and might explain why you are seeing things listening on 19000 but not getting the response you expect.

So I suspect you are not seeing any proxy running at all based on your netstat output. How are you starting the proxy? Is this in Kube?

Can you include the output of the consul connect envoy commands?

BTW, you've mentioned that local_bind_port will listen on the localhost interface. I'd prefer 0.0.0.0 in my scenario (multiple docker containers not running in a k8s pod (yet)) - how can I achieve that?

There is a local_bind_address too but note that anything non-local is insecure in prod (fine for just testing etc.) Anything that can talk to the "private" listeners of the proxy can assume the identity and all access granted to that service which is why it's typical to expose this only over loopback.

If you are trying to get this to work with docker compose I recommend using shared network namespaces at least between the app and sidecar containers rather than just exposing them over the docker bridge. This is easy in docker, start up the app first then start the proxy container using `--network "container:core-demo-api2-container-name" or equivalent in Compose. then the app and the proxy can talk over localhost just like in a kube pod (that is all Kube is doing under the hood).

rboyer commented 6 years ago

If the docker-compose file above is still roughly what you're using, I noticed another small plumbing issue.

Because envoy will be listening on 127.0.0.1 (loopback) for exclusive outbound traffic access, the sidecars need to share the same network namespace with your app so it can connect. The way you have them configured above each of your 5 containers (consul, app1, sidecar1, app2, sidecar2) gets its networking stack complete with local ip address and personal isolated 127.0.0.1 address.

For example, instead of:

services:
  core-demo-api1:
    image: core-demo-api1
    ...
  core-demo-api1-sidecar-proxy:
    image: consul-envoy
    command: "-sidecar-for core-demo-api1 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500"
    ...

You should have something like:

services:
  core-demo-api1:
    image: core-demo-api1
    ...
  core-demo-api1-sidecar-proxy:
    image: consul-envoy
    command: "-sidecar-for core-demo-api1 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500"
    network_mode: "service:core-demo-api1"
    ...

And similar plumbing for core-demo-api2-sidecar-proxy.

rboyer commented 6 years ago

I'm also interested in knowing more about the lifespan of this POC. Is it mostly for kicking the tires of Connect in conjunction with your development stack or are you also planning on using something like docker compose to deploy a variation of the POC directly to a docker swarm cluster?

mmisztal1980 commented 6 years ago

My intention is to do a poc on k8s, however, .NET Solutions developed under Visual Studio 2017 have a docker-compose integration, so I wanted to get that up and running 1st. That way I get do debug/test the solution locally before I deploy it to an actual k8s cluster.

In the long run, at my current company, we are working on decomposing a monolith and migrating it to a new hosting platform while retaining 100% uptime. Consul connect it v. interesting because it can provide a service mesh, and a consistent experience when services communicate with each other, regardless of the hosting platform. That's pretty much the genesis for the PoC.

@rboyer I hope this satisfies your question?

On a side note, this is also a brilliant opportunity to learn.

mmisztal1980 commented 6 years ago

adding network_mode has caused the healthCheck between consul and the proxies to fail :/

mmisztal1980 commented 6 years ago

@rboyer @banks I've added the network_mode: service:{service-name} to both proxies. As far as I understand that ought to make them reuse their counterparts' network stacks - so if I understand correctly, both a service container and it's counterpart proxy, should be able to communicate both ways via localhost.

I've modified my REST call inside core-demo-api1 to call : http://127.0.0.1:19100 - the port I've declared above for core-demo-api1-sidecar-proxy to route to core-demo-api2. I'm getting connection refused, even though, when attaching to the proxy container I can see that 127.0.0.1:19100 is being listened on

HTTP Call from core-demo-api1 to core-demo-api2 via core-demo-api1-sidecar-proxy

```csharp var response = await httpClient.GetAsync("http://127.0.0.1:19100/api/values"); ```

Exception message

``` core-demo-api1_1 | fail: Microsoft.AspNetCore.Server.Kestrel[13] core-demo-api1_1 | Connection id "0HLI1KK6RIP1V", Request id "0HLI1KK6RIP1V:00000001": An unhandled exception was thrown by the application. core-demo-api1_1 | System.Net.Http.HttpRequestException: Connection refused ---> System.Net.Sockets.SocketException: Connection refused core-demo-api1_1 | at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken) core-demo-api1_1 | --- End of inner exception stack trace --- core-demo-api1_1 | at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken) core-demo-api1_1 | at System.Threading.Tasks.ValueTask`1.get_Result() core-demo-api1_1 | at System.Net.Http.HttpConnectionPool.CreateConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken) core-demo-api1_1 | at System.Threading.Tasks.ValueTask`1.get_Result() core-demo-api1_1 | at System.Net.Http.HttpConnectionPool.WaitForCreatedConnectionAsync(ValueTask`1 creationTask) core-demo-api1_1 | at System.Threading.Tasks.ValueTask`1.get_Result() core-demo-api1_1 | at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken) core-demo-api1_1 | at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) core-demo-api1_1 | at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts) core-demo-api1_1 | at Core.Demo.Api1.Controllers.ValuesController.Get() in /src/demo/Core.Demo.Api1/Controllers/ValuesController.cs:line 31 core-demo-api1_1 | at lambda_method(Closure , Object ) core-demo-api1_1 | at Microsoft.Extensions.Internal.ObjectMethodExecutorAwaitable.Awaiter.GetResult() core-demo-api1_1 | at Microsoft.AspNetCore.Mvc.Internal.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments) core-demo-api1_1 | at System.Threading.Tasks.ValueTask`1.get_Result() core-demo-api1_1 | at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeActionMethodAsync() core-demo-api1_1 | at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeNextActionFilterAsync() core-demo-api1_1 | at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Rethrow(ActionExecutedContext context) core-demo-api1_1 | at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted) core-demo-api1_1 | at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeInnerFilterAsync() core-demo-api1_1 | at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeNextResourceFilter() core-demo-api1_1 | at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Rethrow(ResourceExecutedContext context) core-demo-api1_1 | at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted) core-demo-api1_1 | at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeFilterPipelineAsync() core-demo-api1_1 | at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeAsync() core-demo-api1_1 | at Microsoft.AspNetCore.Builder.RouterMiddleware.Invoke(HttpContext httpContext) core-demo-api1_1 | at Core.Api.Middleware.LivenessProbeMiddleware.InvokeAsync(HttpContext context) in /src/src/Core.Api/Middleware/LivenessProbeMiddleware.cs:line 23 core-demo-api1_1 | at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application) ```

docker-compose-yml

```yaml version: '2.4' services: consul: image: consul:1.3.0 command: "agent -dev -server -bootstrap-expect 1 -ui -client 0.0.0.0 -datacenter local -node docker -config-file /etc/consul/services.json" volumes: - ./demo/consul/services.json:/etc/consul/services.json core-demo-api1: image: core-demo-api1 build: context: . dockerfile: demo/Core.Demo.Api1/Dockerfile core-demo-api1-sidecar-proxy: image: consul-envoy command: "-sidecar-for core-demo-api1 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500 -grpc-addr consul:8502" network_mode: "service:core-demo-api1" build: context: ./docker/consul-envoy dockerfile: ./Dockerfile depends_on: consul: condition: service_healthy core-demo-api2: image: core-demo-api2 build: context: . dockerfile: demo/Core.Demo.Api2/Dockerfile core-demo-api2-sidecar-proxy: image: consul-envoy command: "-sidecar-for core-demo-api2 -admin-bind 0.0.0.0:19000 -http-addr http://consul:8500 -grpc-addr consul:8502" network_mode: "service:core-demo-api2" build: context: ./docker/consul-envoy dockerfile: ./Dockerfile depends_on: consul: condition: service_healthy ``` **docker-compose.override.yml** ```yaml version: '2.4' services: consul: ports: - "8400:8400" - "8500:8500" - "8600:8600" - "8600:8600/udp" healthcheck: test: ["CMD-SHELL", "curl --silent --fail localhost:8500/v1/agent/services || exit 1"] interval: 30s timeout: 30s retries: 3 core-demo-api1: environment: - ASPNETCORE_ENVIRONMENT=Compose - ASPNETCORE_URLS=http://+:80 - HOSTNAME=core-demo-api1 ports: - "55127:80" volumes: [] core-demo-api2: environment: - ASPNETCORE_ENVIRONMENT=Compose - ASPNETCORE_URLS=http://+:80 - HOSTNAME=core-demo-api2 ports: - "55185:80" volumes: [] ```

Consul JSON configuration (/etc/consul/services.json)

```json { "connect": { "enabled": true }, "services": [ { "name": "core-demo-api1", "port": 80, "connect": { "sidecar_service": { "proxy": { "local_service_address": "127.0.0.1", "local_service_port": 80, "upstreams": [ { "destination_name": "code-demo-api2", "local_bind_port": 19100 } ] }, "checks": [ { "name": "Connect Sidecar Aliasing core-demo-api1", "alias_service": "core-demo-api1" } ] } } }, { "name": "core-demo-api2", "port": 80, "connect": { "sidecar_service": { "proxy": { "local_service_address": "127.0.0.1", "local_service_port": 80, "upstreams": [ { "destination_name": "code-demo-api1", "local_bind_port": 80 } ] }, "checks": [ { "name": "Connect Sidecar Aliasing core-demo-api2", "alias_service": "core-demo-api2" } ] } } } ] } ```

banks commented 6 years ago

Hey @mmisztal1980 as far as I understand network_mode should behave as you expected so I can't see an obvious reason here for this not working. My guess would be that Envoy is failing to get configured correctly so isn't setting up the upstream listener.

Can you try adding -- -l DEBUG to the end of the proxy command string - this is passed through to envoy and puts it into verbose logging mode. Then check the output of the Envoy process? It's possible it's failing to listen or not getting config delivered for some reason.

mmisztal1980 commented 6 years ago

Hi @banks ,

I've added the debug option (only on core-demo-api1-sidecar-proxy) as you've suggested, here's the output.

What caught my eye: core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.038][8][info][config] source/server/configuration_impl.cc:61] loading 0 listener(s) . I guess this implies a config issue?

output

``` ~/Projects/@cloud-technologies/core-pkg   feature/154 ●  docker-compose up Creating network "core-pkg_default" with the default driver Creating core-pkg_core-demo-api1_1 ... done Creating core-pkg_consul_1 ... done Creating core-pkg_core-demo-api2_1 ... done Creating core-pkg_core-demo-api1-sidecar-proxy_1 ... done Creating core-pkg_core-demo-api2-sidecar-proxy_1 ... done Attaching to core-pkg_core-demo-api1_1, core-pkg_core-demo-api2_1, core-pkg_consul_1, core-pkg_core-demo-api1-sidecar-proxy_1, core-pkg_core-demo-api2-sidecar-proxy_1 core-demo-api1_1 | warn: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[35] core-demo-api1_1 | No XML encryptor configured. Key {3b2a2dd9-bd56-487c-a657-7167a5f43c9d} may be persisted to storage in unencrypted form. core-demo-api2_1 | warn: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[35] core-demo-api2_1 | No XML encryptor configured. Key {e6147221-8d69-4138-9f14-39f3e421f03d} may be persisted to storage in unencrypted form. core-demo-api1_1 | [17:16:36 WRN] No XML encryptor configured. Key {3b2a2dd9-bd56-487c-a657-7167a5f43c9d} may be persisted to storage in unencrypted form. consul_1 | BootstrapExpect is set to 1; this is the same as Bootstrap mode. consul_1 | bootstrap = true: do not enable unless necessary core-demo-api2_1 | [17:16:36 WRN] No XML encryptor configured. Key {e6147221-8d69-4138-9f14-39f3e421f03d} may be persisted to storage in unencrypted form. core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.021][8][info][main] source/server/server.cc:202] initializing epoch 0 (hot restart version=disabled) core-demo-api1_1 | Hosting environment: Compose core-demo-api1_1 | Content root path: /app core-demo-api1_1 | Now listening on: http://[::]:80 core-demo-api1_1 | Application started. Press Ctrl+C to shut down. consul_1 | ==> Starting Consul agent... core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.021][8][info][main] source/server/server.cc:204] statically linked extensions: consul_1 | ==> Consul agent running! consul_1 | Version: 'v1.3.0' consul_1 | Node ID: '2c130a6b-e96c-213e-6786-3b6f84134db8' consul_1 | Node name: 'docker' consul_1 | Datacenter: 'local' (Segment: '') consul_1 | Server: true (Bootstrap: true) consul_1 | Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600) consul_1 | Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302) consul_1 | Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false consul_1 | consul_1 | ==> Log data will now stream in as it occurs: consul_1 | consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Using random ID "2c130a6b-e96c-213e-6786-3b6f84134db8" as node ID consul_1 | 2018/11/05 17:16:36 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:2c130a6b-e96c-213e-6786-3b6f84134db8 Address:127.0.0.1:8300}] consul_1 | 2018/11/05 17:16:36 [INFO] raft: Node at 127.0.0.1:8300 [Follower] entering Follower state (Leader: "") consul_1 | 2018/11/05 17:16:36 [INFO] serf: EventMemberJoin: docker.local 127.0.0.1 consul_1 | 2018/11/05 17:16:36 [INFO] serf: EventMemberJoin: docker 127.0.0.1 consul_1 | 2018/11/05 17:16:36 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp) core-demo-api2_1 | Hosting environment: Compose consul_1 | 2018/11/05 17:16:36 [INFO] consul: Adding LAN server docker (Addr: tcp/127.0.0.1:8300) (DC: local) consul_1 | 2018/11/05 17:16:36 [INFO] consul: Handled member-join event for server "docker.local" in area "wan" consul_1 | 2018/11/05 17:16:36 [DEBUG] agent/proxy: managed Connect proxy manager started consul_1 | 2018/11/05 17:16:36 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp) consul_1 | 2018/11/05 17:16:36 [INFO] agent: Started HTTP server on [::]:8500 (tcp) consul_1 | 2018/11/05 17:16:36 [INFO] agent: started state syncer consul_1 | 2018/11/05 17:16:36 [INFO] agent: Started gRPC server on [::]:8502 (tcp) core-demo-api2_1 | Content root path: /app core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:206] access_loggers: envoy.file_access_log,envoy.http_grpc_access_log core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:209] filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:212] filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:215] filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:217] stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:219] tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:222] transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls consul_1 | 2018/11/05 17:16:36 [WARN] raft: Heartbeat timeout from "" reached, starting election consul_1 | 2018/11/05 17:16:36 [INFO] raft: Node at 127.0.0.1:8300 [Candidate] entering Candidate state in term 2 consul_1 | 2018/11/05 17:16:36 [DEBUG] raft: Votes needed: 1 consul_1 | 2018/11/05 17:16:36 [DEBUG] raft: Vote granted from 2c130a6b-e96c-213e-6786-3b6f84134db8 in term 2. Tally: 1 consul_1 | 2018/11/05 17:16:36 [INFO] raft: Election won. Tally: 1 consul_1 | 2018/11/05 17:16:36 [INFO] raft: Node at 127.0.0.1:8300 [Leader] entering Leader state consul_1 | 2018/11/05 17:16:36 [INFO] consul: cluster leadership acquired core-demo-api2_1 | Now listening on: http://[::]:80 core-demo-api2_1 | Application started. Press Ctrl+C to shut down. core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.022][8][info][main] source/server/server.cc:225] transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls consul_1 | 2018/11/05 17:16:36 [INFO] consul: New leader elected: docker consul_1 | 2018/11/05 17:16:36 [INFO] connect: initialized CA with provider "consul" consul_1 | 2018/11/05 17:16:36 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/11/05 17:16:36 [INFO] consul: member 'docker' joined, marking health alive consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically consul_1 | 2018/11/05 17:16:36 [INFO] agent: Synced service "core-demo-api1" consul_1 | 2018/11/05 17:16:36 [INFO] agent: Synced service "core-demo-api1-sidecar-proxy" consul_1 | 2018/11/05 17:16:36 [INFO] agent: Synced service "core-demo-api2" consul_1 | 2018/11/05 17:16:36 [INFO] agent: Synced service "core-demo-api2-sidecar-proxy" consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Node info in sync consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:16:36 [DEBUG] agent: Node info in sync consul_1 | 2018/11/05 17:16:38 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically consul_1 | 2018/11/05 17:16:38 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:16:38 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/11/05 17:16:38 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:16:38 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/11/05 17:16:38 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:16:38 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:16:38 [DEBUG] agent: Node info in sync consul_1 | 2018/11/05 17:17:06 [DEBUG] http: Request GET /v1/agent/services (881.2µs) from=127.0.0.1:49874 consul_1 | 2018/11/05 17:17:07 [DEBUG] http: Request GET /v1/agent/services (2.4678ms) from=172.27.0.3:35604 core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.030][8][info][main] source/server/server.cc:264] admin address: 0.0.0.0:19000 core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.031][8][info][config] source/server/configuration_impl.cc:50] loading 0 static secret(s) core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.031][8][info][config] source/server/configuration_impl.cc:56] loading 1 cluster(s) core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.032][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:131] cm init: initializing cds core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.038][8][info][config] source/server/configuration_impl.cc:61] loading 0 listener(s) core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.038][8][info][config] source/server/configuration_impl.cc:95] loading tracing configuration core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.038][8][info][config] source/server/configuration_impl.cc:113] loading stats sink configuration core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.038][8][info][main] source/server/server.cc:452] starting main dispatch loop core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.041][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster local_app during init core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.044][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster service:code-demo-api2 during init core-demo-api1-sidecar-proxy_1 | [2018-11-05 17:17:07.044][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:111] cm init: initializing secondary clusters consul_1 | 2018/11/05 17:17:07 [DEBUG] http: Request GET /v1/agent/services (497.5µs) from=172.27.0.4:40410 core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:202] initializing epoch 0 (hot restart version=disabled) core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:204] statically linked extensions: core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:206] access_loggers: envoy.file_access_log,envoy.http_grpc_access_log core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:209] filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:212] filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:215] filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:217] stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:219] tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:222] transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.109][8][info][main] source/server/server.cc:225] transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.113][8][info][main] source/server/server.cc:264] admin address: 0.0.0.0:19000 core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.114][8][info][config] source/server/configuration_impl.cc:50] loading 0 static secret(s) core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.114][8][info][config] source/server/configuration_impl.cc:56] loading 1 cluster(s) core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.115][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:131] cm init: initializing cds core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.122][8][info][config] source/server/configuration_impl.cc:61] loading 0 listener(s) core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.122][8][info][config] source/server/configuration_impl.cc:95] loading tracing configuration core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.122][8][info][config] source/server/configuration_impl.cc:113] loading stats sink configuration core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.122][8][info][main] source/server/server.cc:452] starting main dispatch loop core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.124][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster local_app during init core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.126][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster service:code-demo-api1 during init core-demo-api2-sidecar-proxy_1 | [2018-11-05 17:17:07.126][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:111] cm init: initializing secondary clusters consul_1 | 2018/11/05 17:17:36 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/11/05 17:17:36 [DEBUG] http: Request GET /v1/agent/services (235µs) from=127.0.0.1:49964 consul_1 | 2018/11/05 17:18:06 [DEBUG] http: Request GET /v1/agent/services (249.7µs) from=127.0.0.1:50046 consul_1 | 2018/11/05 17:18:19 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically consul_1 | 2018/11/05 17:18:19 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/11/05 17:18:19 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:18:19 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/11/05 17:18:19 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:18:19 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:18:19 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:18:19 [DEBUG] agent: Node info in sync consul_1 | 2018/11/05 17:18:36 [DEBUG] manager: Rebalanced 1 servers, next active server is docker.local (Addr: tcp/127.0.0.1:8300) (DC: local) consul_1 | 2018/11/05 17:18:36 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/11/05 17:18:36 [DEBUG] http: Request GET /v1/agent/services (174.7µs) from=127.0.0.1:50130 consul_1 | 2018/11/05 17:19:06 [DEBUG] http: Request GET /v1/agent/services (197.9µs) from=127.0.0.1:50210 consul_1 | 2018/11/05 17:19:36 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/11/05 17:19:36 [DEBUG] http: Request GET /v1/agent/services (138µs) from=127.0.0.1:50290 consul_1 | 2018/11/05 17:19:46 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically consul_1 | 2018/11/05 17:19:46 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:19:46 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/11/05 17:19:46 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:19:46 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/11/05 17:19:46 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:19:46 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:19:46 [DEBUG] agent: Node info in sync consul_1 | 2018/11/05 17:20:06 [DEBUG] http: Request GET /v1/agent/services (285.8µs) from=127.0.0.1:50370 consul_1 | 2018/11/05 17:20:35 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/11/05 17:20:36 [DEBUG] http: Request GET /v1/agent/services (613.9µs) from=127.0.0.1:50450 consul_1 | 2018/11/05 17:20:50 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically consul_1 | 2018/11/05 17:20:50 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/11/05 17:20:50 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:20:50 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/11/05 17:20:50 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:20:50 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:20:50 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:20:50 [DEBUG] agent: Node info in sync consul_1 | 2018/11/05 17:21:06 [DEBUG] http: Request GET /v1/agent/services (216.5µs) from=127.0.0.1:50530 consul_1 | 2018/11/05 17:21:34 [DEBUG] manager: Rebalanced 1 servers, next active server is docker.local (Addr: tcp/127.0.0.1:8300) (DC: local) consul_1 | 2018/11/05 17:21:35 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/11/05 17:21:37 [DEBUG] http: Request GET /v1/agent/services (151.7µs) from=127.0.0.1:50614 consul_1 | 2018/11/05 17:22:04 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically consul_1 | 2018/11/05 17:22:04 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/11/05 17:22:04 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:22:04 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/11/05 17:22:04 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:22:04 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:22:04 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:22:04 [DEBUG] agent: Node info in sync consul_1 | 2018/11/05 17:22:07 [DEBUG] http: Request GET /v1/agent/services (242.9µs) from=127.0.0.1:50694 consul_1 | 2018/11/05 17:22:35 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/11/05 17:22:37 [DEBUG] http: Request GET /v1/agent/services (232.9µs) from=127.0.0.1:50774 consul_1 | 2018/11/05 17:23:07 [DEBUG] http: Request GET /v1/agent/services (202.3µs) from=127.0.0.1:50854 consul_1 | 2018/11/05 17:23:15 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically consul_1 | 2018/11/05 17:23:15 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:23:15 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/11/05 17:23:15 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:23:15 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/11/05 17:23:15 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:23:15 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:23:15 [DEBUG] agent: Node info in sync consul_1 | 2018/11/05 17:23:35 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/11/05 17:23:37 [DEBUG] http: Request GET /v1/agent/services (179.8µs) from=127.0.0.1:50934 consul_1 | 2018/11/05 17:24:07 [DEBUG] http: Request GET /v1/agent/services (411.6µs) from=127.0.0.1:51014 consul_1 | 2018/11/05 17:24:08 [DEBUG] manager: Rebalanced 1 servers, next active server is docker.local (Addr: tcp/127.0.0.1:8300) (DC: local) consul_1 | 2018/11/05 17:24:35 [DEBUG] consul: Skipping self join check for "docker" since the cluster is too small consul_1 | 2018/11/05 17:24:37 [DEBUG] http: Request GET /v1/agent/services (157.7µs) from=127.0.0.1:51094 consul_1 | 2018/11/05 17:24:43 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically consul_1 | 2018/11/05 17:24:43 [DEBUG] agent: Service "core-demo-api1" in sync consul_1 | 2018/11/05 17:24:43 [DEBUG] agent: Service "core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:24:43 [DEBUG] agent: Service "core-demo-api2" in sync consul_1 | 2018/11/05 17:24:43 [DEBUG] agent: Service "core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:24:43 [DEBUG] agent: Check "service:core-demo-api2-sidecar-proxy" in sync consul_1 | 2018/11/05 17:24:43 [DEBUG] agent: Check "service:core-demo-api1-sidecar-proxy" in sync consul_1 | 2018/11/05 17:24:43 [DEBUG] agent: Node info in sync ```

banks commented 6 years ago

Yep does look like a config issue, sorry I forgot the detail but can you try one more time with -l trace - that will show you the full discovery request/resposne from Envoy which will help figure out where the misconfiguration is.

mmisztal1980 commented 6 years ago

Hi @banks , I've applied the setting you've suggested, here's the output.

https://gist.github.com/mmisztal1980/a0e36c0f1d1e277470cf318ceea64d04

banks commented 6 years ago

@mmisztal1980 thanks.

I think I see the issue and it's potentially a bug combined with a config issue.

I think from that log what is happening is this:

core-demo-api1-sidecar-proxy (envoy1 from now on) is requesting Clusters and being delivers the upstream and the local_app as expected (so all gRPC and agent config is good - it's talking to agent fine)
It then tries to discover the current set of endpoints for the upstream service:core-demo-api2 and the agent apparently blocks on that request and never returns.
Envoy ADS is a bit smart - it won't even ask us for listener config until it has clusters and endpoints for those clusters to route traffic too so the listeners never even get requested and so never start. In this case.

So the question is why is Consul not delivering any results for service:core-demo-api2. I suspect it's because that service isn't yet registered and passing healthchecks but here is the (possible) bug: if it doesn't find healthy instances, it should return and empty list so the rest of the proxy can continue to be set up. That wouldn't make this work for you since the proxy still wouldn't know where to connect, but it would at least prevent one unhealthy upstream from delaying proxy bootstrap where all the other upstreams might be fine.

This should solve itself as soon as your upstream service comes up and is healthy though - can you confirm that it does (i.e. look in the Consul UI or query via DNS/HTTP to see that the secondary instance it available). Since the service has no health checks it should be healthy immediately and the proxy for it should be too (by it's alias check).

So the mystery here is why is that upstream service discovery not working correctly?

I'll try to reproduce with your docker compose file and config later to debug this more.

Thanks for your help digging on this!

mmisztal1980 commented 6 years ago

@banks The 2nd core-demo-api2 service is up and running and healthy.

banks commented 6 years ago

Out of interest what do you get if you curl consl like (replace with actual host/ip etc).

curl -v consul:8500/v1/health/connect/core-demo-api2

and

curl -v consul:8500/v1/catalog/connect/core-demo-api2

On Fri, Nov 9, 2018 at 2:32 PM Maciek Misztal notifications@github.com wrote:

@banks https://github.com/banks The 2nd core-demo-api2 service is up and running and healthy.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/hashicorp/consul/issues/4868#issuecomment-437376512, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHYU_5ztEQzXvcKVtB6XfC1gBGUBielks5utZHpgaJpZM4X_fw5 .

mmisztal1980 commented 6 years ago

Sure here it is,

curl -v consul:8500/v1/health/connect/core-demo-api1

``` # curl -v consul:8500/v1/health/connect/core-demo-api1 * Trying 172.27.0.4... * TCP_NODELAY set * Connected to consul (172.27.0.4) port 8500 (#0) > GET /v1/health/connect/core-demo-api1 HTTP/1.1 > Host: consul:8500 > User-Agent: curl/7.52.1 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: application/json < Vary: Accept-Encoding < X-Consul-Effective-Consistency: leader < X-Consul-Index: 13 < X-Consul-Knownleader: true < X-Consul-Lastcontact: 0 < Date: Fri, 09 Nov 2018 23:19:58 GMT < Transfer-Encoding: chunked < [ { "Node": { "ID": "2c130a6b-e96c-213e-6786-3b6f84134db8", "Node": "docker", "Address": "127.0.0.1", "Datacenter": "local", "TaggedAddresses": { "lan": "127.0.0.1", "wan": "127.0.0.1" }, "Meta": { "consul-network-segment": "" }, "CreateIndex": 5, "ModifyIndex": 5 }, "Service": { "Kind": "connect-proxy", "ID": "core-demo-api1-sidecar-proxy", "Service": "core-demo-api1-sidecar-proxy", "Tags": [], "Address": "", "Meta": null, "Port": 21000, "Weights": { "Passing": 1, "Warning": 1 }, "EnableTagOverride": false, "ProxyDestination": "", "Proxy": { "DestinationServiceName": "core-demo-api1", "DestinationServiceID": "core-demo-api1", "LocalServiceAddress": "127.0.0.1", "LocalServicePort": 80, "Upstreams": [ { "DestinationType": "service", "DestinationName": "code-demo-api2", "Datacenter": "", "LocalBindPort": 19100, "Config": null } ] }, "Connect": {}, "CreateIndex": 6, "ModifyIndex": 6 }, "Checks": [ { "Node": "docker", "CheckID": "serfHealth", "Name": "Serf Health Status", "Status": "passing", "Notes": "", "Output": "Agent alive and reachable", "ServiceID": "", "ServiceName": "", "ServiceTags": [], "Definition": {}, "CreateIndex": 13, "ModifyIndex": 13 }, { "Node": "docker", "CheckID": "service:core-demo-api1-sidecar-proxy", "Name": "Connect Sidecar Aliasing core-demo-api1", "Status": "passing", "Notes": "", "Output": "No checks found.", "ServiceID": "core-demo-api1-sidecar-proxy", "ServiceName": "core-demo-api1-sidecar-proxy", "ServiceTags": [], "Definition": {}, "CreateIndex": 6, "ModifyIndex": 6 } ] } ] * Curl_http_done: called premature == 0 * Connection #0 to host consul left intact # ```

curl -v consul:8500/v1/health/connect/core-demo-api2

``` # curl -v consul:8500/v1/health/connect/core-demo-api2 * Trying 172.27.0.4... * TCP_NODELAY set * Connected to consul (172.27.0.4) port 8500 (#0) > GET /v1/health/connect/core-demo-api2 HTTP/1.1 > Host: consul:8500 > User-Agent: curl/7.52.1 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: application/json < Vary: Accept-Encoding < X-Consul-Effective-Consistency: leader < X-Consul-Index: 13 < X-Consul-Knownleader: true < X-Consul-Lastcontact: 0 < Date: Fri, 09 Nov 2018 23:18:04 GMT < Transfer-Encoding: chunked < [ { "Node": { "ID": "2c130a6b-e96c-213e-6786-3b6f84134db8", "Node": "docker", "Address": "127.0.0.1", "Datacenter": "local", "TaggedAddresses": { "lan": "127.0.0.1", "wan": "127.0.0.1" }, "Meta": { "consul-network-segment": "" }, "CreateIndex": 5, "ModifyIndex": 5 }, "Service": { "Kind": "connect-proxy", "ID": "core-demo-api2-sidecar-proxy", "Service": "core-demo-api2-sidecar-proxy", "Tags": [], "Address": "", "Meta": null, "Port": 21001, "Weights": { "Passing": 1, "Warning": 1 }, "EnableTagOverride": false, "ProxyDestination": "", "Proxy": { "DestinationServiceName": "core-demo-api2", "DestinationServiceID": "core-demo-api2", "LocalServiceAddress": "127.0.0.1", "LocalServicePort": 80, "Upstreams": [ { "DestinationType": "service", "DestinationName": "code-demo-api1", "Datacenter": "", "LocalBindPort": 80, "Config": null } ] }, "Connect": {}, "CreateIndex": 9, "ModifyIndex": 9 }, "Checks": [ { "Node": "docker", "CheckID": "serfHealth", "Name": "Serf Health Status", "Status": "passing", "Notes": "", "Output": "Agent alive and reachable", "ServiceID": "", "ServiceName": "", "ServiceTags": [], "Definition": {}, "CreateIndex": 13, "ModifyIndex": 13 }, { "Node": "docker", "CheckID": "service:core-demo-api2-sidecar-proxy", "Name": "Connect Sidecar Aliasing core-demo-api2", "Status": "passing", "Notes": "", "Output": "No checks found.", "ServiceID": "core-demo-api2-sidecar-proxy", "ServiceName": "core-demo-api2-sidecar-proxy", "ServiceTags": [], "Definition": {}, "CreateIndex": 9, "ModifyIndex": 9 } ] } ] * Curl_http_done: called premature == 0 * Connection #0 to host consul left intact # ```

banks commented 6 years ago

Hmm not sure why it causes the Envoy config failure it does but one issue I see there is that your agent nodes are registering with 127.0.0.1 as their address.

That means the service discovery results for the service in one container is going to return local IP which seems wrong - you need the IP of the other container to connect out to it.

Typically you'd have a local Consul agent on each node that is configured to bind to the public IP of that node and then this would work as expected - services by default are advertised at their agent's bind (or advertise) address.

In this setup where you don't have "local" agents on each "node" (i.e. each api container network namespace) you would need to register the services with an explicit Address set to their docker IP that other containers can connect to.

Alternatively you can get closer to a production simulation by starting an actual consul agent container for each "node" that also shares the api container's namespace. If you do that the arguments for http-addr etc shouldn't be needed as the proxy can connect to it's "local" agent just like a non-container setup on multiple hosts would do.

When I get a chance I still want to reproduce locally so I can figure out why Envoy hangs part way through config in this case. But let me know if that IP config helps.

mmisztal1980 commented 5 years ago

Hi @banks , it's been a while. I was wondering if you folks have made any progress investigating this, as I'll be giving it another spin soon (tm)

banks commented 5 years ago

I'm pretty sure I know why the hanging bug happens but it may only be part of the story here.

The bug is due to this line: https://github.com/hashicorp/consul/blob/c2a30c5fdd6229ce98d6271b58f12582d7dc9dfe/agent/xds/server.go#L349-L352

Basically assuming that if we didn't get a response yet (since this is all async internally) so part of the config is empty then we shouldn't bother sending it to the proxy. The problem is that in a case where there are legitimately no instances available (not registered yet or failing health checks) then we end up not sending the endpoints at all which Envoy hangs waiting for.

I think that's an easy fix, but based on your Curl output above I'm not really sure if it's the only issue going on with your setup.

stale[bot] commented 5 years ago

Hey there, We wanted to check in on this request since it has been inactive for at least 60 days. If you think this is still an important issue in the latest version of Consul or its documentation please reply with a comment here which will cause it to stay open for investigation. If there is still no activity on this issue for 30 more days, we will go ahead and close it.

Feel free to check out the community forum as well! Thank you!

stale[bot] commented 5 years ago

Hey there, This issue has been automatically closed because there hasn't been any activity for at least 90 days. If you are still experiencing problems, or still have questions, feel free to open a new one :+1:

ghost commented 4 years ago

Hey there,

This issue has been automatically locked because it is closed and there hasn't been any activity for at least 30 days.

If you are still experiencing problems, or still have questions, feel free to open a new one :+1:.

hashicorp / consul