shotover / shotover-proxy

L7 data-layer proxy
https://docs.shotover.io
Apache License 2.0
82 stars 16 forks source link

KafkaSinkCluster: rack aware routing for fetch requests #1637

Closed rukai closed 1 month ago

rukai commented 1 month ago

Progress towards: https://github.com/shotover/shotover-proxy/issues/1526

With this PR shotover should now route all requests to their correct rack. In order to catch misconfigurations, or even shotover bugs, a shotover_out_of_rack_requests_count metric is introduced to count out of rack requests. This matches the shotover_out_of_rack_requests_count metric used in CassandraSinkCluster.

Fetch requests can be routed to any replica, some of which are in shotover's rack and some are outside of shotover's rack. Previously we were just sending fetch requests to a random replica. But with this PR we now always send to a replica within shotover's rack, unless such a replica does not exist in which case we fall back to any replica at all. To make this routing cheap to perform at runtime, shotover's stored partition replica nodes list is split into shotover_rack_replica_nodes and external_rack_replica_nodes fields.

For all other request types, there is only one possible destination. For these request types shotover modifies the metadata response such that the client will send requests to the shotover in the same rack as the destination, ensuring that no cross-rack routing occurs. e.g. MetadataResponse::controller_id is set to the shotover in the rack of the controller broker.

https://github.com/shotover/shotover-proxy/blob/6fb74a23e43de76ba26c2f67d94fc35166526947/shotover/src/transforms/kafka/sink_cluster/mod.rs#L1515-L1525

TODO in follow up PR:

codspeed-hq[bot] commented 1 month ago

CodSpeed Performance Report

Merging #1637 will not alter performance

Comparing rukai:kafka_rack_aware_routing (309e5c8) with main (65b280b)

Summary

✅ 37 untouched benchmarks