tari-project / tari

The Tari protocol
https://tari.com
BSD 3-Clause "New" or "Revised" License
354 stars 219 forks source link

feat: verify active base node peer connections and disconnect if stale #6655

Closed hansieodendaal closed 3 weeks ago

hansieodendaal commented 3 weeks ago

Description

Added check connections to the p2p services (MonitorPeersService). All active connections are pinged on a set (slowish) interval (10 times slower than the auto ping metadata interval). The nodes that do not respond timeously on three consecutive iterations with a corresponding pong are disconnected. This will help keep the list of active connections (lazily) up to date.

Edit: Fixed an error in the liveness service where misbehaving ping peers were never disconnected. The liveness service and monitor peers service work hand in hand. Liveness selects 8 randomly connected peers to obtain metadata from and will disconnect any of those that misbehave after 1 minute (2x ping intervals). The monitor peers service assesses all connected peers at a much slower pace and disconnects non-responsive peers after 15 minutes (10 x 3 ping intervals).

Motivation and Context

See #6516

How Has This Been Tested?

Performed system-level testing. From the log below we can see that 5 of 41 active peer connections did not respond with a ping. Peer e19e1454a1e0519866297960ad was disconnected because it did not respond three times in a row,

2024-10-29 15:12:07.664466900 [minotari::base_node::monitor_peers] TRACE Found 5 of 41 outbound base node peer 
  connections that did not respond to pings
2024-10-29 15:12:07.664619800 [minotari::base_node::monitor_peers] TRACE Peer e2fa82050c2f7579febafb7e08 
  stats - (iteration, connected, responsive) [(3, true, true), (4, true, true), (5, true, false)]
2024-10-29 15:12:07.664683300 [minotari::base_node::monitor_peers] DEBUG Disconnecting e19e1454a1e0519866297960ad 
  as the peer is no longer responsive - (iteration, connected, responsive) [(2, true, true), (3, true, false), 
  (4, true, false), (5, true, false)]
2024-10-29 15:12:07.665853300 [minotari::base_node::monitor_peers] TRACE Peer 6ea597117476676d5ddcb18153 
  stats - (iteration, connected, responsive) [(1, true, true), (2, true, true), (3, true, true), (4, true, true), 
  (5, true, false)]
2024-10-29 15:12:07.665965500 [minotari::base_node::monitor_peers] TRACE Peer a671f812efe5ab14cbb3c1f9f4 
  stats - (iteration, connected, responsive) [(2, true, true), (3, true, true), (4, true, true), 
  (5, true, false)]
2024-10-29 15:12:07.665997800 [minotari::base_node::monitor_peers] TRACE Peer e336b264e02f611cf4fbf51f22 
  stats - (iteration, connected, responsive) [(2, true, true), (3, true, true), (4, true, true), 
  (5, true, false)]

What process can a PR reviewer use to test or verify this change?

Breaking Changes

github-actions[bot] commented 3 weeks ago

Test Results (CI)

    3 files    129 suites   38m 37s ⏱️ 1 344 tests 1 344 ✅ 0 💤 0 ❌ 4 030 runs  4 030 ✅ 0 💤 0 ❌

Results for commit d95e35c1.

:recycle: This comment has been updated with latest results.

github-actions[bot] commented 3 weeks ago

Test Results (Integration tests)

36 tests  +36   36 ✅ +36   14m 17s ⏱️ + 14m 17s 11 suites +11    0 💤 ± 0   2 files   + 2    0 ❌ ± 0 

Results for commit d95e35c1. ± Comparison against base commit 47b48770.

:recycle: This comment has been updated with latest results.