Closed justinweng-instaclustr closed 1 month ago
Comparing justinweng-instaclustr:handle-offline-shotover-nodes
(b1e7742) with main
(2b11e0c)
❌ 1
regressions
✅ 38
untouched benchmarks
:warning: Please fix the performance issues or acknowledge them on CodSpeed.
Benchmark | main |
justinweng-instaclustr:handle-offline-shotover-nodes |
Change | |
---|---|---|---|---|
❌ | encode_system.local_result_v5_no_compression |
93.1 µs | 105.6 µs | -11.83% |
The regression benchmark encode_system.local_result_v5_no_compression
is for Cassandra and hence a noise.
After introducing
ShotoverNodeState
toShotoverNode
in https://github.com/shotover/shotover-proxy/pull/1758, we should add a task to detect down shotover nodes and setShotoverNodeState
accordingly.This PR adds a background task
check_shotover_peers
looping over peer shotover nodes and trying to open a TCP connection to each peer shotover node. If the connection cannot be established withinconnect_timeout_ms
, the peer node is marked as down.connect_timeout_ms
is the same configuration used when creating a connection to a destination kafka broker.check_shotover_peers_delay_ms
+ random(-check_shotover_peers_delay_ms
/10,check_shotover_peers_delay_ms
/10)) before moving to the next peer shotover node.start_shotover_peers_check
is called when the instance ofKafkaSinkClusterBuilder
is being created and hence is called exactly once.check_shotover_peers
is be invoked at all if there's no peer shotover node (i.e., there's only 1 shotover node in the cluster)check_shotover_peers
is restarted if the creation of random number generator fails.The next PR will change metadata rewrites to exclude down shotover nodes.