openark / orchestrator

MySQL replication topology management and HA
Apache License 2.0
5.61k stars 927 forks source link

Selection criteria for replacement Intermediate Masters #283

Closed sjmudd closed 7 years ago

sjmudd commented 7 years ago

https://github.com/github/orchestrator/pull/279 and https://github.com/github/orchestrator/pull/281 provide some comments that relate to the incorrect selection of a new IM after another IM failed. https://github.com/github/orchestrator/pull/281 resolved that specific problem.

However, while checking the code I questioned the ordering of the selection of an intermediate master and suggested the following order:

  1. search for IsCandidate in same dc & env
  2. search for IsCandidate somewhere else
  3. search for sibling in the same DC & ENV [*]
  4. search for any remaining valid sibling [*]

Current behaviour is similar, but has 2 and 3 reversed.

My argument for this new behaviour is that if I have chosen candidate masters then I would prefer to use these rather than "any odd" box. Shlomi points out that the possible consequence of this is that if there's only a candidate server in another datacentre then this will trigger cross-DC replication of slaves under the server that failed to the new IM in the other data centre (and potentially extra latency issues).

This is a valid concern.

I would therefore argue that "ideally" you would therefore configure orchestrator to be aware of a candidate in the same dc as the master (master failover will likely try to use this box first which is good) but also a candidate master in any other datacentres. If you have an IM in a secondary datacentre then you could configure this as a candidate master but you could also configure additional candidates etc.

I feel that if I'm going to make a conscious effort to tell orchestrator about candidate masters then I'd prefer orchestrator to use them in preference to other boxes. Previously orchestrator would pick the "best" or "most appropriate" replacement and generally that's good enough for most people. It's now possible to exclude servers using the "must_not" mechanism so anyone configuring with an explicit preference really would prefer these boxes to be used as masters. Perhaps it may not be so important to use them as intermediate masters but I don't think that's bad.

One of the good things about orchestrator is that even if it does this "cross DC replication thing" then moving slaves back under a better box is quite easy to do from the gui or the cli if you prefer. If you previously knew that this other box was ok why wasn't it configured as a candidate previously.... ?

So these comment are to discuss this topic as maybe I'm missing something here. As mentioned I'd favour the order shown above.

shlomi-noach commented 7 years ago

I would therefore argue that "ideally" you would therefore configure orchestrator to be aware of a candidate in the same dc as the master (master failover will likely try to use this box first which is good) but also a candidate master in any other datacentres.

I agree to the above. But also consider, what if the candidate IM in DC1 was the one to fail in the first place.

I feel that if I'm going to make a conscious effort to tell orchestrator about candidate masters then I'd prefer orchestrator to use them in preference to other boxes. Previously orchestrator would pick the "best" or "most appropriate" replacement and generally that's good enough for most people.

Makes sense.

One of the good things about orchestrator is that even if it does this "cross DC replication thing" then moving slaves back under a better box is quite easy to do from the gui or the cli if you prefer.

True.

Giving this further thoughts, I don't feel strongly about this.

It boils down to "is inter-DC-replication more important than Candidate". Both your arguments and mine rely on the assumption/fact that since this is an intermediate master, it's not-a-big-deal either way, because a human can always change the setup at leisure afterwards.

I'm happy to make the change as requested. I'm just wondering now, whether someone else will come up with a counter setup scenario and argue that we should change back. Consider many companies have many small setups, as small as "two replicas in this DC and two in another DC", so they don't have much room for assigning candidates etc. I'm merely guessing we'd see such setup where the current logic fits well and the suggested logic does not fit well.

Obviously we cannot satisfy everyone, and I'd rather not have this behavior configurable.