hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.39k stars 4.43k forks source link

Consul connect services do not reconnect after booting up the cluster #21935

Open suikast42 opened 2 weeks ago

suikast42 commented 2 weeks ago

Operating system and Environment details

Nomad 1.6..0 CNI 1.6.0 Consul 1.20.0 CNI 1.6.0

Job file

job "countdash_app_mesh" {
  datacenters = ["nomadder1"]
  group "api" {
    count = 1
#    constraint {
#      distinct_hosts = true
#    }
#         constraint {
#           attribute    = "${attr.unique.hostname}"
#           set_contains = "worker-02"
#         }
    network {
      mode = "bridge"
      port "api" {
        to = 9001
#        host_network = "public"
      }
    }

    service {
      name = "count-api"
      port = "api"
      address_mode = "alloc"
      connect {
        sidecar_service {}
      }

      check {
        name     = "api_health"
        type     = "http"
        path     = "/health"
        port     = "api"
        interval = "10s"
        timeout  = "2s"
        address_mode = "alloc"
      }

    }

    task "count-api" {
      driver = "docker"

      config {
        image = "hashicorpnomad/counter-api:v3"
        ports = ["api"]
      }

      resources {
        cpu    = 100
        memory = 128
      }
    }
  }

  group "dashboard" {
    count = 1
        # constraint {
        #   attribute    = "${attr.unique.hostname}"
        #   set_contains = "worker-01"
        # }
    network {
      mode = "bridge"

      port "http" {
        to = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"
      tags = [
        "traefik.enable=true",
        "traefik.consulcatalog.connect=true",
        "traefik.http.routers.count-dashboard.tls=true",
        "traefik.http.routers.count-dashboard.rule=Host(`count.cloud.private`)"
      ]

      connect {
        sidecar_service {
          proxy {
            #            config {
            #              protocol = "http"
            #            }
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8080
            }
          }
        }
      }
    }

    task "dashboard" {
      driver = "docker"

      env {
        CONSUL_TLS_SERVER_NAME = "localhost"
        COUNTING_SERVICE_URL   = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
      }

      config {
        image = "hashicorpnomad/counter-dashboard:v3"
      }

      resources {
        cpu    = 100
        memory = 128
      }
    }
  }
}

If I deploy this job everthing is ok until I rboot my vms.

After restart of the vms ( 1 worker and 1 master ) the connect services do not come up again

Log connect-dashboard


[2024-10-21 11:06:15.193][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:173] dns resolution without records for tempo-zipkin.service.consul
[2024-10-21 11:06:15.193][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:308] dns resolution for tempo-zipkin.service.consul completed with status 0
[2024-10-21 11:06:15.193][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:201] DNS refresh rate reset for tempo-zipkin.service.consul, refresh rate 5000 ms
[2024-10-21 11:06:20.188][1][debug][main] [source/server/server.cc:237] flushing stats
[2024-10-21 11:06:20.193][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:391] dns resolution for tempo-zipkin.service.consul started
[2024-10-21 11:06:20.197][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:173] dns resolution without records for tempo-zipkin.service.consul
[2024-10-21 11:06:20.197][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:308] dns resolution for tempo-zipkin.service.consul completed with status 0
[2024-10-21 11:06:20.197][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:201] DNS refresh rate reset for tempo-zipkin.service.consul, refresh rate 5000 ms
[2024-10-21 11:06:20.511][15][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:264] [Tags: "ConnectionId":"25"] new tcp proxy session
[2024-10-21 11:06:20.511][15][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:459] [Tags: "ConnectionId":"25"] Creating connection to cluster local_app
[2024-10-21 11:06:20.511][15][debug][misc] [source/common/upstream/cluster_manager_impl.cc:2329] Allocating TCP conn pool
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"26"] connecting to 127.0.0.1:9002
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:1036] [Tags: "ConnectionId":"26"] connection in progress
[2024-10-21 11:06:20.511][15][debug][conn_handler] [source/common/listener_manager/active_tcp_listener.cc:160] [Tags: "ConnectionId":"25"] new connection from 172.21.2.20:34960
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"25"] closing socket: 0
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:669] cancelling pending stream
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:150] [Tags: "ConnectionId":"26"] closing data_to_write=0 type=1
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"26"] closing socket: 1
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"26"] client disconnected, failure reason: 
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 0 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:06:20.511][15][debug][conn_handler] [source/common/listener_manager/active_stream_listener_base.cc:136] [Tags: "ConnectionId":"25"] adding to cleanup list

Logs of same instace after restart consul

[2024-10-21 11:18:16.223][1][debug][connection] [source/common/tls/ssl_socket.cc:246] [Tags: "ConnectionId":"64"] remote address:alloc/tmp/consul_grpc.sock,TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.224][1][debug][client] [source/common/http/codec_client.cc:107] [Tags: "ConnectionId":"64"] disconnect. resetting 1 pending requests
[2024-10-21 11:18:16.224][1][debug][client] [source/common/http/codec_client.cc:159] [Tags: "ConnectionId":"64"] request reset
[2024-10-21 11:18:16.224][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:215] [Tags: "ConnectionId":"64"] destroying stream: 0 remaining
[2024-10-21 11:18:16.224][1][debug][router] [source/common/router/router.cc:1384] [Tags: "ConnectionId":"0","StreamId":"1431416887679520993"] upstream reset: reset reason: connection termination, transport failure reason: 
[2024-10-21 11:18:16.224][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:188] DeltaAggregatedResources gRPC config stream to local_agent closed: 13, 
[2024-10-21 11:18:16.224][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment failed
[2024-10-21 11:18:16.224][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2024-10-21 11:18:16.224][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2024-10-21 11:18:16.224][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"64"] client disconnected, failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.224][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:18:16.425][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:63] Establishing new gRPC bidi stream to local_agent for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

[2024-10-21 11:18:16.425][1][debug][router] [source/common/router/router.cc:527] [Tags: "ConnectionId":"0","StreamId":"1952948337656148254"] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2024-10-21 11:18:16.425][1][debug][router] [source/common/router/router.cc:756] [Tags: "ConnectionId":"0","StreamId":"1952948337656148254"] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.100'

[2024-10-21 11:18:16.425][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2024-10-21 11:18:16.425][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:18:16.425][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:18:16.425][1][debug][http2] [source/common/http/http2/codec_impl.cc:1695] [Tags: "ConnectionId":"109"] updating connection-level initial window size to 268435456
[2024-10-21 11:18:16.426][1][debug][connection] [./source/common/network/connection_impl.h:98] [Tags: "ConnectionId":"109"] current connecting state: true
[2024-10-21 11:18:16.426][1][debug][client] [source/common/http/codec_client.cc:57] [Tags: "ConnectionId":"109"] connecting
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"109"] connecting to alloc/tmp/consul_grpc.sock
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/network/connection_impl.cc:746] [Tags: "ConnectionId":"109"] connected
[2024-10-21 11:18:16.426][1][debug][misc] [source/common/network/io_socket_error_impl.cc:64] Unknown error code 32 details Broken pipe
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/tls/ssl_socket.cc:246] [Tags: "ConnectionId":"109"] remote address:alloc/tmp/consul_grpc.sock,TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"109"] closing socket: 0
[2024-10-21 11:18:16.426][1][debug][client] [source/common/http/codec_client.cc:107] [Tags: "ConnectionId":"109"] disconnect. resetting 0 pending requests
[2024-10-21 11:18:16.426][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"109"] client disconnected, failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][router] [source/common/router/router.cc:1384] [Tags: "ConnectionId":"0","StreamId":"1952948337656148254"] upstream reset: reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][http] [source/common/http/async_client_impl.cc:182] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end'

[2024-10-21 11:18:16.426][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:195] DeltaAggregatedResources gRPC config stream to local_agent closed: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment failed
[2024-10-21 11:18:16.426][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2024-10-21 11:18:16.426][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2024-10-21 11:18:16.426][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:18:16.782][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:63] Establishing new gRPC bidi stream to local_agent for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

[2024-10-21 11:18:16.782][1][debug][router] [source/common/router/router.cc:527] [Tags: "ConnectionId":"0","StreamId":"13244006445772974244"] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2024-10-21 11:18:16.782][1][debug][router] [source/common/router/router.cc:756] [Tags: "ConnectionId":"0","StreamId":"13244006445772974244"] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.100'

[2024-10-21 11:18:16.782][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2024-10-21 11:18:16.782][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:18:16.782][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:18:16.782][1][debug][http2] [source/common/http/http2/codec_impl.cc:1695] [Tags: "ConnectionId":"110"] updating connection-level initial window size to 268435456
[2024-10-21 11:18:16.782][1][debug][connection] [./source/common/network/connection_impl.h:98] [Tags: "ConnectionId":"110"] current connecting state: true
[2024-10-21 11:18:16.782][1][debug][client] [source/common/http/codec_client.cc:57] [Tags: "ConnectionId":"110"] connecting
[2024-10-21 11:18:16.782][1][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"110"] connecting to alloc/tmp/consul_grpc.sock
[2024-10-21 11:18:16.783][1][debug][connection] [source/common/network/connection_impl.cc:746] [Tags: "ConnectionId":"110"] connected
[2024-10-21 11:18:16.783][1][debug][misc] [source/common/network/io_socket_error_impl.cc:64] Unknown error code 32 details Broken pipe
[2024-10-21 11:18:16.783][1][debug][connection] [source/common/tls/ssl_socket.cc:246] [Tags: "ConnectionId":"110"] remote address:alloc/tmp/consul_grpc.sock,TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"110"] closing socket: 0
[2024-10-21 11:18:16.783][1][debug][client] [source/common/http/codec_client.cc:107] [Tags: "ConnectionId":"110"] disconnect. resetting 0 pending requests
[2024-10-21 11:18:16.783][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"110"] client disconnected, failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][router] [source/common/router/router.cc:1384] [Tags: "ConnectionId":"0","StreamId":"13244006445772974244"] upstream reset: reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][http] [source/common/http/async_client_impl.cc:182] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end'

[2024-10-21 11:18:16.783][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:232] DeltaAggregatedResources gRPC config stream to local_agent closed: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment failed
[2024-10-21 11:18:16.783][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2024-10-21 11:18:16.783][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2024-10-21 11:18:16.783][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:18:18.409][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:63] Establishing new gRPC bidi stream to local_agent for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

[2024-10-21 11:18:18.409][1][debug][router] [source/common/router/router.cc:527] [Tags: "ConnectionId":"0","StreamId":"11959217123621313443"] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2024-10-21 11:18:18.409][1][debug][router] [source/common/router/router.cc:756] [Tags: "ConnectionId":"0","StreamId":"11959217123621313443"] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.100'

[2024-10-21 11:18:18.409][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2024-10-21 11:18:18.410][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:18:18.410][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:18:18.410][1][debug][http2] [source/common/http/http2/codec_impl.cc:1695] [Tags: "ConnectionId":"111"] updating connection-level initial window size to 268435456
[2024-10-21 11:18:18.410][1][debug][connection] [./source/common/network/connection_impl.h:98] [Tags: "ConnectionId":"111"] current connecting state: true
[2024-10-21 11:18:18.410][1][debug][client] [source/common/http/codec_client.cc:57] [Tags: "ConnectionId":"111"] connecting
[2024-10-21 11:18:18.410][1][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"111"] connecting to alloc/tmp/consul_grpc.sock
[2024-10-21 11:18:18.410][1][debug][connection] [source/common/network/connection_impl.cc:746] [Tags: "ConnectionId":"111"] connected
[2024-10-21 11:18:18.415][1][debug][client] [source/common/http/codec_client.cc:88] [Tags: "ConnectionId":"111"] connected
[2024-10-21 11:18:18.415][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:328] [Tags: "ConnectionId":"111"] attaching to next stream
[2024-10-21 11:18:18.415][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:182] [Tags: "ConnectionId":"111"] creating stream
[2024-10-21 11:18:18.415][1][debug][router] [source/common/router/upstream_request.cc:593] [Tags: "ConnectionId":"0","StreamId":"11959217123621313443"] pool ready
tgross commented 1 day ago

Looks like from these logs that the tasks are coming up and the Envoy proxy is getting its bootstrap configuration. I'm going to move this issue to the Consul repo as now we're firmly in Consul territory.