Description

This PR addresses the issue outlined in #70 by introducing enhancements and fixes to the codebase. It includes changes to improve test coverage and optimize system behavior related to high availability and fault tolerance in the event of server failures.

Code Changes

Test SurviveReplicaFailure: This new test is introduced to validate the high availability and fault tolerance of the system in the event of a server failure. It spawns actors with multiple replicas, simulates a server failure by killing one of the servers, and validates that actor invocations can still succeed despite the loss of a server replica.
environment.invokeReferences(): The documentation for this method is updated to provide a clear understanding of its purpose and behavior. It selects a server for invocation based on the provided references and the create flag. The method returns the index of the reference that was invoked, the response as an io.ReadCloser, and an error if any occurred during the invocation process.
types.ActorOptions.RetryPolicy: The RetryPolicy field is introduced in the ActorOptions struct. It specifies the retry policy for actor invocations when an invocation fails. The possible values are "retry_never" (indicating no retries) and "retry_if_replica_available" (indicating retries on other available replicas).

Additional Notes

The test scenarios and steps are defined clearly in the test function comments, providing a comprehensive understanding of the test objectives.
The changes made in this PR address the high availability and fault tolerance aspects, enhancing the resilience and optimization of the system.
This PR includes documentation updates and new tests to ensure a comprehensive solution.

richardartoul / nola

Implement Retry Policy and High-Availability Tests #81

Description

Code Changes

Additional Notes

70