hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.27k stars 4.42k forks source link

Service Mash is not balancing the workload evenly across multiple instances in Nomad cluster. #21778

Open ruslan-y opened 1 week ago

ruslan-y commented 1 week ago

Hi!

I have 10 gateway instances (Ingress Gateway) and 10 "Proxy A" (Envoy) instances to which traffic from the gateway goes. Than I have 2 "Proxy B" instances (Envoy also) that accept traffic from "Proxy A".

Load balancing from "Proxy A" to "Proxy B" working incorrect.

From the network load on the host, I can see that traffic is only going to one instance "Proxy B".

1st instance of Proxy B image

2nd instance of Proxy B image

If I redeployed nomad job "Proxy A" a workload balancing is going correctly.

1st instance of Proxy B image

2nd instance of Proxy B image

But when I redeployed the nomad job "Proxy B" the balancing "breaks down" again.

I tried writing the congfiguration to Consul for the "Proxy A"

Kind           = "service-resolver"
Name           = "Proxy A"
LoadBalancer = {
  Policy = "round_robin"
}

and for "Proxy B"

Kind           = "service-resolver"
Name           = "Proxy B"
LoadBalancer = {
  Policy = "round_robin"
}
Envoy configuration file of "Proxy A" ``` node: cluster: test id: proxy_a admin: access_log: - name: admin_access typed_config: "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog path: {{ env "NOMAD_ALLOC_DIR" }}/logs/admin_access.log address: socket_address: address: 0.0.0.0 port_value: 19901 dynamic_resources: ads_config: api_type: DELTA_GRPC transport_api_version: V3 grpc_services: - envoy_grpc: cluster_name: xds_grpc static_resources: listeners: - name: proxy_a address: socket_address: address: 0.0.0.0 port_value: 8162 filter_chains: - filters: - name: envoy.http_connection_manager typed_config: "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager stat_prefix: proxy_a http2_protocol_options: allow_connect: true upgrade_configs: - upgrade_type: websocket rds: route_config_name: proxy_a config_source: resource_api_version: V3 ads: {} http_filters: - name: envoy.filters.http.grpc_web typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb - name: envoy.filters.http.router typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router transport_socket: name: envoy.transport_sockets.tls typed_config: "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext require_client_certificate: true common_tls_context: validation_context: trusted_ca: filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/gateway-ca.pem tls_certificates: - certificate_chain: filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/downstream.pem private_key: filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/downstream-key.pem alpn_protocols: [ "h2,http/1.1" ] clusters: - name: proxy_b connect_timeout: 0.25s type: STATIC lb_policy: ROUND_ROBIN load_assignment: cluster_name: proxy_b endpoints: - lb_endpoints: - endpoint: address: socket_address: address: {{ env "NOMAD_UPSTREAM_IP_proxy_b" }} port_value: {{ env "NOMAD_UPSTREAM_PORT_proxy_b" }} circuit_breakers: thresholds: - priority: "DEFAULT" max_connections: 1000000000 max_pending_requests: 1000000000 max_requests: 1000000000 max_retries: 1000000000 retry_budget: budget_percent: value: 25.0 track_remaining: true - priority: "HIGH" max_connections: 1000000000 max_pending_requests: 1000000000 max_requests: 1000000000 max_retries: 1000000000 retry_budget: budget_percent: value: 25.0 track_remaining: true typed_extension_protocol_options: envoy.extensions.upstreams.http.v3.HttpProtocolOptions: "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions explicit_http_config: http2_protocol_options: {} - name: xds_grpc load_assignment: cluster_name: xds_grpc endpoints: - lb_endpoints: - endpoint: address: socket_address: address: {{ env "NOMAD_UPSTREAM_IP_[[ .my.xds_upstream ]]" }} port_value: {{ env "NOMAD_UPSTREAM_PORT_[[ .my.xds_upstream ]]" }} circuit_breakers: thresholds: - priority: "DEFAULT" max_connections: 1000000000 max_pending_requests: 1000000000 max_requests: 1000000000 max_retries: 1000000000 retry_budget: budget_percent: value: 25.0 track_remaining: true - priority: "HIGH" max_connections: 1000000000 max_pending_requests: 1000000000 max_requests: 1000000000 max_retries: 1000000000 retry_budget: budget_percent: value: 25.0 track_remaining: true typed_extension_protocol_options: envoy.extensions.upstreams.http.v3.HttpProtocolOptions: "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions upstream_http_protocol_options: auto_sni: true common_http_protocol_options: idle_timeout: 1s explicit_http_config: http2_protocol_options: max_concurrent_streams: 100 ```

Server nomad version

Nomad v1.8.3
BuildDate 2024-08-13T07:37:30Z
Revision 63b636e5cbaca312cf6ea63e040f445f05f00478

Server consul version

Consul v1.19.1
Revision 9f62fb41
Build Date 2024-07-11T14:47:27Z

Client nomad version

Nomad v1.5.6
BuildDate 2023-05-19T18:26:13Z
Revision 8af70885c02ab921dedbdf6bc406a1e886866f80

Client consul version

Consul v1.14.7
Revision d97acc0a
Build Date 2023-05-16T01:36:41Z