rabbitmq / ra

A Raft implementation for Erlang and Elixir that strives to be efficient and make it easier to use multiple Raft clusters in a single system.
Other
798 stars 93 forks source link

Improve Ra server resilience when log infrastructure encounters faults #428

Closed kjnilsson closed 2 months ago

kjnilsson commented 2 months ago

Various improvements to data safety when log infrastructure processes encounter faults.

In particular there are many improvements and fixes relating to the server -> wal resend protocol including:

Also there is a new feature to configure on a per system basis what kind of server recovery should take place when a ra system starts/restarts. There are 3 options:

  1. undefined : do not restart any ra server
  2. registered: restart all locally registered servers for the system
  3. mfa: call a custom function that performs the restart.

This feature will allow dynamically started ra server to be restarted should the ra system crash and restart.

Also improvements to code coverage and refactoring.

Fixes: https://github.com/rabbitmq/ra/issues/416

pjk25 commented 2 months ago

@kjnilsson you can rebase now that #431 is merged