This PR fixes #863 and introduces the following changes:
Add the FailoverTests2 test class that reproduces the tests from FailoverTests but works with mock clusters instead of actual clusters, so that we can tweak the behavior of the clusters, and reproduce a cluster not sending the members view event within a reasonable time
Modify HeartBeat to ensure that the cancellation tokens are properly propagated down to the ping messages, which will then abort correctly in case the cluster goes down
Modify RetryStrategy to expose a cancellation token corresponding to the cluster connection timeout, that can be used even before we have to retry, to detect that the timeout has been reached (for instance, while waiting for the member views event)
Modify ClusterConnections to ensure that, should the connection opens but the member views event is never received, a timeout (based on the retry strategy timeout) is correctly detected, the connection is correctly teared down, and the next cluster (if any) is tried
The test FailoverTests2.TestClientCanFailoverWhenNoInitialMembershipEvent fails if executed without the above changes, and succeed with them. It connects the client to a first cluster, then stops that cluster, triggering failover to a second cluster that does not send members view events, thus triggering failover to a third cluster, which succeeds.
In addition, RetryStrategy was updated to ensure that the maximum cluster connection timeout does not exceed a value compatible with fitting it into a TimeSpan value and passing that value to a CancellationTokenSource - as some users (and one of our tests) may thing that long.MaxValue is safe to indicate "the longest timeout possible". We now simply and transparently trim their value to whatever is "the longest timeout we can support".
Changes to other files are cosmetic or there to improve testing and logging.
This PR fixes #863 and introduces the following changes:
FailoverTests2
test class that reproduces the tests fromFailoverTests
but works with mock clusters instead of actual clusters, so that we can tweak the behavior of the clusters, and reproduce a cluster not sending the members view event within a reasonable timeHeartBeat
to ensure that the cancellation tokens are properly propagated down to the ping messages, which will then abort correctly in case the cluster goes downRetryStrategy
to expose a cancellation token corresponding to the cluster connection timeout, that can be used even before we have to retry, to detect that the timeout has been reached (for instance, while waiting for the member views event)ClusterConnections
to ensure that, should the connection opens but the member views event is never received, a timeout (based on the retry strategy timeout) is correctly detected, the connection is correctly teared down, and the next cluster (if any) is triedThe test
FailoverTests2.TestClientCanFailoverWhenNoInitialMembershipEvent
fails if executed without the above changes, and succeed with them. It connects the client to a first cluster, then stops that cluster, triggering failover to a second cluster that does not send members view events, thus triggering failover to a third cluster, which succeeds.In addition,
RetryStrategy
was updated to ensure that the maximum cluster connection timeout does not exceed a value compatible with fitting it into aTimeSpan
value and passing that value to aCancellationTokenSource
- as some users (and one of our tests) may thing thatlong.MaxValue
is safe to indicate "the longest timeout possible". We now simply and transparently trim their value to whatever is "the longest timeout we can support".Changes to other files are cosmetic or there to improve testing and logging.