This sequence of updates now causes the connection between sentinels, used for requesting attestations from other sentinels, to silently fail. Even though the false return value from tcp_client::init is now considered a warning and the sentinel continues running, the connection is not working because the handler thread never gets started. So now, RPC calls from sentinel to sentinel fail at runtime.
The first sentinel in the startup sequence could potentially have no working connections and get stuck in endless retry loops.
Sentinel to sentinel communication is actually a situation where cluster_connect(endpoints, false) should be used even with a single endpoint, since it's fine for one or more sentinels to be unreachable temporarily (we'll just use another). But because m_server_endpoints.size() <= 1 is used as parameter to cluster_connect that behavior is impossible.
The reason sentinel->sentinel communication uses clusters of 1 is because the sentinel has to control which other sentinels are called to prevent getting more than one attestation from the same sentinel. So we can't just build a cluster of all other sentinels and use send_to_one()
My suggestion, through this PR, is to add an optional boolean overload to init() that allows you to set the error_fatal parameter of cluster_connect manually.
Follow-up on #135 , #167 and #168
This sequence of updates now causes the connection between sentinels, used for requesting attestations from other sentinels, to silently fail. Even though the
false
return value fromtcp_client::init
is now considered a warning and the sentinel continues running, the connection is not working because the handler thread never gets started. So now, RPC calls from sentinel to sentinel fail at runtime.The first sentinel in the startup sequence could potentially have no working connections and get stuck in endless retry loops.
Sentinel to sentinel communication is actually a situation where
cluster_connect(endpoints, false)
should be used even with a single endpoint, since it's fine for one or more sentinels to be unreachable temporarily (we'll just use another). But becausem_server_endpoints.size() <= 1
is used as parameter to cluster_connect that behavior is impossible.The reason sentinel->sentinel communication uses clusters of 1 is because the sentinel has to control which other sentinels are called to prevent getting more than one attestation from the same sentinel. So we can't just build a cluster of all other sentinels and use
send_to_one()
My suggestion, through this PR, is to add an optional boolean overload to
init()
that allows you to set theerror_fatal
parameter ofcluster_connect
manually.