hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
28.27k stars 4.42k forks source link

Service Mash is not balancing the workload evenly across multiple instances in Nomad cluster. #21778

Open ruslan-y opened 1 week ago

ruslan-y commented 1 week ago


I have 10 gateway instances (Ingress Gateway) and 10 "Proxy A" (Envoy) instances to which traffic from the gateway goes. Than I have 2 "Proxy B" instances (Envoy also) that accept traffic from "Proxy A".

Load balancing from "Proxy A" to "Proxy B" working incorrect.

From the network load on the host, I can see that traffic is only going to one instance "Proxy B".

1st instance of Proxy B image

2nd instance of Proxy B image

If I redeployed nomad job "Proxy A" a workload balancing is going correctly.

1st instance of Proxy B image

2nd instance of Proxy B image

But when I redeployed the nomad job "Proxy B" the balancing "breaks down" again.

I tried writing the congfiguration to Consul for the "Proxy A"

Kind           = "service-resolver"
Name           = "Proxy A"
LoadBalancer = {
  Policy = "round_robin"

and for "Proxy B"

Kind           = "service-resolver"
Name           = "Proxy B"
LoadBalancer = {
  Policy = "round_robin"
Envoy configuration file of "Proxy A" ``` node: cluster: test id: proxy_a admin: access_log: - name: admin_access typed_config: "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog path: {{ env "NOMAD_ALLOC_DIR" }}/logs/admin_access.log address: socket_address: address: port_value: 19901 dynamic_resources: ads_config: api_type: DELTA_GRPC transport_api_version: V3 grpc_services: - envoy_grpc: cluster_name: xds_grpc static_resources: listeners: - name: proxy_a address: socket_address: address: port_value: 8162 filter_chains: - filters: - name: envoy.http_connection_manager typed_config: "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager stat_prefix: proxy_a http2_protocol_options: allow_connect: true upgrade_configs: - upgrade_type: websocket rds: route_config_name: proxy_a config_source: resource_api_version: V3 ads: {} http_filters: - name: envoy.filters.http.grpc_web typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb - name: envoy.filters.http.router typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router transport_socket: name: envoy.transport_sockets.tls typed_config: "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext require_client_certificate: true common_tls_context: validation_context: trusted_ca: filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/gateway-ca.pem tls_certificates: - certificate_chain: filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/downstream.pem private_key: filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/downstream-key.pem alpn_protocols: [ "h2,http/1.1" ] clusters: - name: proxy_b connect_timeout: 0.25s type: STATIC lb_policy: ROUND_ROBIN load_assignment: cluster_name: proxy_b endpoints: - lb_endpoints: - endpoint: address: socket_address: address: {{ env "NOMAD_UPSTREAM_IP_proxy_b" }} port_value: {{ env "NOMAD_UPSTREAM_PORT_proxy_b" }} circuit_breakers: thresholds: - priority: "DEFAULT" max_connections: 1000000000 max_pending_requests: 1000000000 max_requests: 1000000000 max_retries: 1000000000 retry_budget: budget_percent: value: 25.0 track_remaining: true - priority: "HIGH" max_connections: 1000000000 max_pending_requests: 1000000000 max_requests: 1000000000 max_retries: 1000000000 retry_budget: budget_percent: value: 25.0 track_remaining: true typed_extension_protocol_options: envoy.extensions.upstreams.http.v3.HttpProtocolOptions: "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions explicit_http_config: http2_protocol_options: {} - name: xds_grpc load_assignment: cluster_name: xds_grpc endpoints: - lb_endpoints: - endpoint: address: socket_address: address: {{ env "NOMAD_UPSTREAM_IP_[[ .my.xds_upstream ]]" }} port_value: {{ env "NOMAD_UPSTREAM_PORT_[[ .my.xds_upstream ]]" }} circuit_breakers: thresholds: - priority: "DEFAULT" max_connections: 1000000000 max_pending_requests: 1000000000 max_requests: 1000000000 max_retries: 1000000000 retry_budget: budget_percent: value: 25.0 track_remaining: true - priority: "HIGH" max_connections: 1000000000 max_pending_requests: 1000000000 max_requests: 1000000000 max_retries: 1000000000 retry_budget: budget_percent: value: 25.0 track_remaining: true typed_extension_protocol_options: envoy.extensions.upstreams.http.v3.HttpProtocolOptions: "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions upstream_http_protocol_options: auto_sni: true common_http_protocol_options: idle_timeout: 1s explicit_http_config: http2_protocol_options: max_concurrent_streams: 100 ```

Server nomad version

Nomad v1.8.3
BuildDate 2024-08-13T07:37:30Z
Revision 63b636e5cbaca312cf6ea63e040f445f05f00478

Server consul version

Consul v1.19.1
Revision 9f62fb41
Build Date 2024-07-11T14:47:27Z

Client nomad version

Nomad v1.5.6
BuildDate 2023-05-19T18:26:13Z
Revision 8af70885c02ab921dedbdf6bc406a1e886866f80

Client consul version

Consul v1.14.7
Revision d97acc0a
Build Date 2023-05-16T01:36:41Z