Various improvements to data safety when log infrastructure processes encounter faults.
In particular there are many improvements and fixes relating to the server -> wal resend protocol including:
Bug fix to ra_log_cache that would cause most triggered resends result in a ra process crash.
Dropping fewer messages using the gen_state postpone feature.
Ra leaders would previously just exit with wal_down - now they enter the same await_condition state although with a shorter timeout after which the begin a leader transfer process
Improved detection and availability when a command is lost on the way to the wal and no further commands are sent.
Also there is a new feature to configure on a per system basis what kind of server recovery should take place when a ra system starts/restarts. There are 3 options:
undefined : do not restart any ra server
registered: restart all locally registered servers for the system
mfa: call a custom function that performs the restart.
This feature will allow dynamically started ra server to be restarted should the ra system crash and restart.
Also improvements to code coverage and refactoring.
Various improvements to data safety when log infrastructure processes encounter faults.
In particular there are many improvements and fixes relating to the server -> wal resend protocol including:
wal_down
- now they enter the same await_condition state although with a shorter timeout after which the begin a leader transfer processAlso there is a new feature to configure on a per system basis what kind of server recovery should take place when a ra system starts/restarts. There are 3 options:
undefined
: do not restart any ra serverregistered
: restart all locally registered servers for the systemmfa
: call a custom function that performs the restart.This feature will allow dynamically started ra server to be restarted should the ra system crash and restart.
Also improvements to code coverage and refactoring.
Fixes: https://github.com/rabbitmq/ra/issues/416