outbrain / orchestrator

MySQL replication topology manager/visualizer
Other
829 stars 168 forks source link

Rate limit orchestrator recoveries even across topologies #206

Open shlomi-noach opened 8 years ago

shlomi-noach commented 8 years ago

Have a max-recoveries-per-hour limitation or similar. Even across clusters, we may wish to get a human involved in such case where there's just too many things breaking concurrently.

sjmudd commented 8 years ago

For situations like this a brake is good.

Consequently I'd be tempted to have a storage setting for "globalAutomaticRecoveryDisabled" which is read every few seconds and the running/active node will take that into consideration. The GUI should also have a way to change this setting: "GlobalAutomaticRecovery: Disabled/Enabled" which updates this table, and an appropriate CLI entry to query/enable/disable this behaviour, perhaps with a hook to notify people of the change in state.

This is a long list of things I would like to see. It may not seem useful to have all of this but a global failure such as a DC failure may make this sort of brake quite useful.