mit-dci / opencbdc-tx

A transaction processor for a hypothetical, general-purpose, central bank digital currency
Other
896 stars 198 forks source link

Make cross-sentinel connection failures non-fatal #177

Closed wadagso-gertjaap closed 2 years ago

wadagso-gertjaap commented 2 years ago

Follow-up on #135 , #167 and #168

This sequence of updates now causes the connection between sentinels, used for requesting attestations from other sentinels, to silently fail. Even though the false return value from tcp_client::init is now considered a warning and the sentinel continues running, the connection is not working because the handler thread never gets started. So now, RPC calls from sentinel to sentinel fail at runtime.

The first sentinel in the startup sequence could potentially have no working connections and get stuck in endless retry loops.

Sentinel to sentinel communication is actually a situation where cluster_connect(endpoints, false) should be used even with a single endpoint, since it's fine for one or more sentinels to be unreachable temporarily (we'll just use another). But because m_server_endpoints.size() <= 1 is used as parameter to cluster_connect that behavior is impossible.

The reason sentinel->sentinel communication uses clusters of 1 is because the sentinel has to control which other sentinels are called to prevent getting more than one attestation from the same sentinel. So we can't just build a cluster of all other sentinels and use send_to_one()

My suggestion, through this PR, is to add an optional boolean overload to init() that allows you to set the error_fatal parameter of cluster_connect manually.

pr4u4t commented 2 years ago

I've also noticed that +1