Open cmusser opened 3 years ago
orchestrator
does not handle the reprovisioning of the old primary, and that is left for the user. TL;DR this is outside the scope of orchestrator, and would require an agent running on the host, with likely root
access required and with a rather intimate knowledge of your setup/infrastructure.
there have been two suggestions to making orchestrator
coordinate backup/restores; both kind of tripled the codebase and I'm afraid are not in the capacity of this project.
Ok, that is good to know. I wasn't actually sure what Orchestrator would do in the scenario where the primary server vanishes. The fact that it promoted a replica without intervention is very helpful. Making the code 3x bigger to completely recover doesn't seem worth it, for sure. Too many site-specific aspects to deal with.
When the downed server started back up, Orchestrator put it into its own cluster and we were wondering why that was. We figured that since that server existed in the little metadata database we created for Orch (the one used by the Detect
config directives), that it would return to the cluster, but in a non-replicating and downtimed state. It is downtimed, but off it a separate cluster, as if Orchestrator retained no knowledge of it. One thing I did see was an ack-cluster-recovery
command. Would issuing that command before restarting the dead server allow it to be recognized as part of the cluster?
@shlomi-noach we hit this issue as well. Can you explain why the restarted master is seen as a different cluster when we re-discover it, even though DetectClusterAliasQuery is the same? Perhaps it will help us find the right solution.
@cmusser did you find a way to resolve this issue?
we hit this issue as well.
@liortamari can you first explain what is the issue you're hitting? The original comment illustrated a scenario, but there was no real issue, other than the user expecting the old primary to now replicate from the new primary.
I think what @shlomi-noach is saying here is:
That's my understanding of it anyway.
@shlomi-noach thank you, the issue I am trying to understand how to best resolve is a scenario where a master restarted due to an error. For example, i have a cluster with 2 instances:
orchestrator-client -c all-instances
mysql-misc-a:3306 mysql-misc-b:3306
orchestrator-client -c clusters-alias
mysql-misc-a:3306,mysql-misc
orchestrator-client -c all-clusters-masters
mysql-misc-a:3306
When the master mysql-misc-a is restarted, the slave mysql-misc-b is promoted to master as expected. And now the orchestrator state shows 2 cluster aliases:
orchestrator-client -c clusters-alias
mysql-misc-a:3306,mysql-misc-a:3306 mysql-misc-b:3306,mysql-misc
orchestrator-client -c all-instances
mysql-misc-a:3306 mysql-misc-b:3306
orchestrator-client -c all-clusters-masters
mysql-misc-b:3306
My question, is there a command that I can run to tell orchestrator to take the old master mysql-misc-a as replication slave under the new master? or must I configure replication outside of orchestrator scope like @cmusser suggested earlier? In general, I was surprised after the restart I have 2 cluster aliases because according to the DetectClusterAliasQuery it should be the same cluster. so I was hoping to better understand what is the best way to remedy it, preferably by using orchestrator-client only
@liortamari thank you for elaborating.
@cmusser is correct about (1). Why does this happen? MySQL-wise, there's actually no such notion as a "cluster". MySQL does not care about clusters (in async/semisync replication), only about one server replicating from another. So this is a metadata we decorate your topology with, and that's done via DetectClusterAliasQuery
. So far so good.
Now, a primary failed, promotion took place. 5 minutes or 5 hours later the primary comes back to life. What happens now? MySQL has no insights. It's down to orchestrator
to make the best of a situation. Here's what it knows:
n
instances, all claiming to be in mysql-misc
clustern-1
of those instances are connected in a replication graph (something orchestrator
is able to identify)1
instance is not connected with the rest of them. 2
servers which act as primariesorchestrator
-wise, both claim to be the head of mysql-misc
This is why you see two clusters.
Now orchestrator
needs to figure out which is the "real" cluster, which it does by:
primary
as lost-in-recovery
(and internal downtime/tag)That's how orchestrator
decides in the event of post-failover scenario. There can be other scenarios, where multiple clusters all pretend to be the same one, and orchestrator
would choose the largest, by way of heuristic. But that's orthogonal to our discussion.
Anyway. If the old primary
has transactions not present in the new cluster, then there is nothing orchestrator
can do. There's just no way to make it a happy replica in the cluster. You will have to e.g. restore the server from backup. Also, it's imperative that orchestrator
doesn't do anything, because your business is likely to want to salvage those lost records.
Now, as I mentioned earlier, there is one exception. If:
primary
is found againReplicationCredentialsQuery
) or (have granted SELECT
privileges on mysql.slave_master_info
and have configured master_info_repository = 'TABLE'
), so that `orchestrator is able to get some idea on how to configure a server as a replica,Then, it's possible to reconfigure it as a replica and connect it back to the cluster. It's a bit complex, because what happens if that old primary
returns after 5
hours? Do we keep tracking forever? Anyway, it's an idea
@shlomi-noach thank you, Conditions 1-4 are met in my test. I would think the most important thing to do, in case a dead master reappears, is to mark it read-only, right? And I see orchestrator does that. So it seems to me orchestrator does need to keep that tracking forever, even if for the read_only alone, is that correct?
@liortamari "forever" is a strong word. Even if orchestrator
does keep checking till the old server reappears, it will do so in intervals. There will be a period of time where that old seerver would still be writable, before orchestrator
turns it read_only
. And this is nice to have, but please consider, what problem does this solve?
This does not solve split brains -- it narrows the split brain time a bit. This does not restore the server into your topology. This does not normalize/align the data on the reappearing server.
I think the discussion is digressing. The intent was to "tell orchestrator to take the old master mysql-misc-a as replication slave under the new master".
To avoid split brains, to ensure that the old primary never has more transactions than the newly promoted server, you must use semi-sync
, and pay the price for commit latency. If you want the same for cross-region, pay the price for cross-region latency.
I'm not against orchestrator
setting read_only=1
on reappearing servers. Just pointing out that this does not solve any of the questions above.
@shlomi-noach thanks for the explanation.
I mentioned the read_only because I noticed that orchestrator did set read_only=1
to the reappearing master.
So I thought that was part of the logic. I now understand it is not.
Hi,
I have a topology with a master and two replicas where I'm testing master server failures. The recovery process isn't doing what I'd expect. When I shutdown MySQL on the master, one of replicas becomes the master, which is good. But when I start the old master back up, I'd like it to reappear in the cluster, and, ideally, begin replicating the new master.
My Orchestrator version:
I started with this initial topology:
Next I stopped the MySQL server on dbtest01. The topology now looks like:
There is still just the one cluster (named test-cluster) in the web portal.
I restart the MySQL server on dbtest01. I notice that dbtest01 appears as its own cluster in the web console. The topology commands now show:
As it stands, the topology has split into two separate clusters, which are:
Is that what is supposed to happen? What are the steps for getting
dbtest01
back in action as a replica? I did that manually by snapshotting the new master withxtrabackup
, restoring that ondbtest01
and restarting replication. But I'd hoped the manual steps wouldn't be needed.I attached the config and the contents of the metadata table that Orchestrator uses.
meta.txt orchestrator.txt
I can post logs as needed. But I think I'm not understanding something about how this is supposed to work.