Open aniket-z opened 9 months ago
Hi, I think the thing you want to achieve can be done with Kuma by using https://kuma.io/docs/2.6.x/policies/meshloadbalancingstrategy/#disable-cross-zone-traffic-and-prioritize-traffic-the-dataplanes-on-the-same-node-and-availability-zone
First you need to add a tag to your dataplane in the same availability zone.
e.g:
type: Dataplane
mesh: default
name: { { name } }
networking:
address: { { address } }
inbound:
- port: 8000
servicePort: 80
tags:
kuma.io/service: backend
kuma.io/protocol: HTTP
kuma.io/availability-zone: zone1
this kuma.io/availability-zone: az-1
is an example and you can add different one to the dataplanes in the same location.
When your dataplanes are tagged you need to create a policy:
type: MeshLoadBalancingStrategy
name: local-zone-affinity-backend
mesh: mesh-1
spec:
targetRef:
kind: Mesh
to:
- targetRef:
kind: MeshService
name: backend
default:
localityAwareness:
localZone:
affinityTags:
- key: kuma.io/availability-zone
weight: 1000
in this case, most of the requests will be routed to the dataplanes with the same value of kuma.io/availability-zone
tag.
@lukidzi This does not seem to take into account number of dataplanes of source & destination service while routing traffic and if there is imbalance in number of dataplanes across availability-zones in either source or destination service, then amount of requests per second going to each dataplane of the destination service would be imbalanced.
For example: source-service: 4 tasks in az-1a and 1 task in az-1b. destination-service: 1 task in az-1a and 4 tasks in az-1b. Assuming all dataplanes are healthy, wouldn't single task of destination-service in az-1a receive much more traffic than each task of destination-service in az-1b?
We run our workload on spot instances in AWS and we can't ensure that tasks of all services would be equally balanced across availability zones at all times but would still want to ensure that all tasks of a given service receive uniform traffic (similar requests per second) so that they also have similar CPU % and similar response time
That's true assumption that
source-service: 4 tasks in az-1a and 1 task in az-1b.
destination-service: 1 task in az-1a and 4 tasks in az-1b.
more traffic is routed to the local az, but you can configure it with the weight. The thing you want to achieve seems like default traffic when traffic is routed equally to all instances.
Edit:
I did more testing around zone aware routing
I think it makes sense to add this as a 2nd option of localZone load balancing, one is the one we had based on weight, and another could be zone aware
which is a bit more complicated.
How does zone-aware lb work in envoy? Some conditions need to be fulfilled to make it work. Based on the logic, we can configure the number of a minimal number of endpoints in the locality that can handle specific
common_lb_config:
zone_aware_lb_config:
min_cluster_size: x
My testing config:
admin:
access_log_path: /tmp/admin_access.log
address:
socket_address: { address: 0.0.0.0, port_value: 9903 }
node:
locality:
zone: zone_c
cluster_manager:
local_cluster_name: local_cluster
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 9002 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: '/'
route:
cluster: server
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: local_cluster
connect_timeout: 0.25s
type: STATIC
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: local_cluster
endpoints:
- locality:
zone: 'zone_a'
lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8000
- locality:
zone: 'zone_b'
lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8003
- locality:
zone: 'zone_c'
lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8004
- name: server
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
common_lb_config:
zone_aware_lb_config:
min_cluster_size: 2
load_assignment:
cluster_name: some_service
endpoints:
- locality:
zone: 'zone_a'
lb_endpoints:
- endpoint:
address:
socket_address:
address: envoy_server_a_1
port_value: 8001
- locality:
zone: 'zone_b'
lb_endpoints:
- endpoint:
address:
socket_address:
address: envoy_server_b_1
port_value: 8002
- endpoint:
address:
socket_address:
address: envoy_server_b_2
port_value: 8003
- locality:
zone: 'zone_c'
lb_endpoints:
- endpoint:
address:
socket_address:
address: envoy_server_c_1
port_value: 8004
- endpoint:
address:
socket_address:
address: envoy_server_c_2
port_value: 8005
- endpoint:
address:
socket_address:
address: envoy_server_c_3
port_value: 8006
Depends on the number of local_cluster_name
instances in current zone traffic might be routed only in the zone or cross zone. E.g. when our service has 4 instance in zone-c and there are 2 instances of destination service we would go cross zone. but if there are 3 instances in the destination we would stay in the same zone.
We could implement:
type: MeshLoadBalancingStrategy
name: local-zone-affinity-backend
mesh: mesh-1
spec:
targetRef:
kind: Mesh
to:
- targetRef:
kind: MeshService
name: backend
default:
localityAwareness:
localZone:
type: Weighted | ZoneAware
zoneAware: #
zoneIdentifier: `topology.kubernetes.io/zone` # not sure if that is going to be possible because node info are bootstrap config and has to be on init, so we might need to set these variables kuma-cp config and configure at bootstrap request
subZoneIdentifier: `my-label.k8s.io/node`
minClusterSize: 3
affinityTags:
- key: kuma.io/availability-zone
weight: 1000
Not sure if setting locality based on dynamic configuration is possible because node infos is- bootstrap configuration and has to be on init, so we might need to set these variables in kuma-cp config and configure at bootstrap request.
Triage: @aniket-z would you be interested in contributing this?
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.
Description
Our current setup has the following components:
Now, we want to use envoy's zone aware routing feature to reduce our inter-AZ network cost
We are aware that kuma supports locality-aware-routing but there it seems we are configuring routing using envoy's priority based load balancing rather than using envoy's zone aware routing feature.
Consider a scenario where source service calls destination service and source service has 4 tasks in zone A and 1 task in zone B but destination service has 1 task in zone A and 4 tasks in zone B. Here, if we use kuma's locality aware routing, from what I have understood, it seems that traffic on all tasks of destination service would not be same (and thus CPU % would not be same for all tasks) whereas if we were to use envoy's zone aware routing throughput per task of destination service (and thus the CPU%) would be same as it does routing taking zone level task count of both source & destination service in account. Please correct me if I have misunderstood envoys' zone aware routing or kuma's locality aware routing and if the problem I have described is not valid?
Is there any way we could use envoy's zone aware routing feature with kuma? We don't want to change kuma.io/zone as that would require us to setup kuma's zone ingress & egress and we don't want to introduce an extra hop (and thus extra cost & latency) and an extra component into our system. Please suggest how should we proceed here