Failover recovery, graceful takeover don't work w/binlog

jlevene commented 5 years ago

We set up a test environment with 1 master and 2 slaves. (We were trying, obviously without success, to have a setup where Orchestrator would work "out of the box".)

Here's the Orchestrator setup file: orchestrator.conf.json.txt

We're using ProxySQL to send SELECTs (without update) to the slaves and everything else to the master.

Originally, failovers did not work at all. We added read_only=1 to the MySQL config files, and added the pre-failover hook recommended by Percona and at least failover started to work. We didn't do anything else to tell Orchestrator about ProxySQL. (According to the Percona article, the post-failover hook they give is no longer needed.)

In the config:

"PreGracefulTakeoverProcesses": [
     "/tmp/prefailover.sh"
  ],

and the /tmp/prefailover.sh script (here, 10.42.42.42 is the VIP of keepalived for the 2 ProxySQL instances):

#!/bin/bash

# Variable exposed by Orchestrator
OldMaster=$ORC_FAILED_HOST
PROXYSQL_HOST="10.42.42.42"

# stop accepting connections to old master
(
echo 'UPDATE mysql_servers SET STATUS="OFFLINE_SOFT" WHERE hostname="'"$OldMaster"'";'
echo "LOAD MYSQL SERVERS TO RUNTIME;"
) | mysql -vvv -uivan -p**** -h ${PROXYSQL_HOST} -P6032

# wait while connections are still active and we are in the grace period
CONNUSED=`mysql -uivan -p**** -h ${PROXYSQL_HOST} -P6032 -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND srv_host="'"$OldMaster"'"' -B -N 2&gt; /dev/null`
TRIES=0
while [ $CONNUSED -ne 0 -a $TRIES -ne 20 ]
do
  CONNUSED=`mysql -uivan -p**** -h ${PROXYSQL_HOST} -P6032 -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND srv_host="'"$OldMaster"'"' -B -N 2&gt; /dev/null`
  TRIES=$(($TRIES+1))
  if [ $CONNUSED -ne "0" ]; then
    sleep 0.05
  fi
done

Now, if we kill the master, Orchestrator will eventually (5 minutes) promote a slave and get everything working again. When the former master is brought back up, Orchestrator never brings it back into replication; it has to be made a slave manually.

When we try to do a graceful master takeover with a slave from CLI, it refuses, saying ERROR Relocating 1 replicas of stg1wpplatmysql04:3306 below stg1wpplatmysql03:3306 turns to be too complex; please do it manually.

When we try with the GUI (by dragging it "on top of" the master, it also refuses, saying Desginated instance stg1wpplatgarbd02:3306 cannot take over all of its siblings. Error: 2019-03-01 12:13:19 ERROR Relocating 1 replicas of stg1wpplatmysql04:3306 below stg1wpplatgarbd02:3306 turns to be too complex; please do it manually. We also get the following in the log:

Mar  1 10:39:28 stg1wpplatdbmgr01 orchestrator: 2019-03-01 10:39:28 INFO moveReplicasViaGTID: Will move 1 replicas below stg1wpplatmysql03:3306 via GTID

However, we're not using GTID. When we query the Orchestrator with the API, it reports:

# curl -s http://localhost:3000/api/problems | jq
[
  {
    "Key": {
      "Hostname": "stg1wpplatmysql04",
      "Port": 3306
    },
    "InstanceAlias": "",
    "Uptime": 78686,
    "ServerID": 2,
    "ServerUUID": "1dcecf18-3b05-11e9-8732-0050568411f5",
    "Version": "5.7.23-23-57-log",
    "VersionComment": "Percona XtraDB Cluster (GPL), Release rel23, Revision f5578f0, WSREP version 31.31, wsrep_31.31",
    "FlavorName": "Percona",
    "ReadOnly": false,
    "Binlog_format": "ROW",
    "BinlogRowImage": "FULL",
    "LogBinEnabled": true,
    "LogSlaveUpdatesEnabled": false,
    "SelfBinlogCoordinates": {
      "LogFile": "mysql-bin.000007",
      "LogPos": 1628009,
      "Type": 0
    },
...

I don't know whether this is a documentation issue or a bug, or a combination. (I'm very suspicious there's some configuration that would fix this if we only knew how to do it, which would make it a doc issue, I suppose.)

tomkrouper commented 5 years ago

We don't use ProxySQL, so I can't really speak to its involvement in your issues, but could you attempt a failover with debugging set on and share the logs?

jlevene commented 5 years ago

Hi @tomkrouper , I tried again from the gui with same results, here are the logs reported:


Mar  1 16:32:33 stg1wpplatdbmgr01 orchestrator: 2019-03-01 16:32:33 INFO GracefulMasterTakeover: designated master instructed to be stg1wpplatgarbd02:3306
Mar  1 16:32:33 stg1wpplatdbmgr01 orchestrator: 2019-03-01 16:32:33 INFO GracefulMasterTakeover: Will let stg1wpplatgarbd02:3306 take over its siblings
Mar  1 16:32:33 stg1wpplatdbmgr01 orchestrator: 2019-03-01 16:32:33 INFO moveReplicasViaGTID: Will move 1 replicas below stg1wpplatgarbd02:3306 via GTID
Mar  1 16:32:33 stg1wpplatdbmgr01 orchestrator: 2019-03-01 16:32:33 ERROR Relocating 1 replicas of stg1wpplatmysql04:3306 below stg1wpplatgarbd02:3306 turns to be too complex; please do it manually
Mar  1 16:32:33 stg1wpplatdbmgr01 orchestrator: [martini] Completed 500 Internal Server Error in 22.112263ms```

liuqian1990 commented 5 years ago

@jlevene Will the configuration of mysql

shlomi-noach commented 5 years ago

turns to be too complex

means you probably don't use GTID, don't use Pseudo-GTID, don't use binlog servers -- so orchestrator is not sure how to correlate the binary logs of two servers. It takes at least one of them to run a failover.

I don't know whether this is a documentation issue or a bug, or a combination.

Good question. I'm pretty sure somewhere in the docs it says you need to have one of the theree aforementioned options. Maybe not.

jlevene commented 5 years ago

Thanks for your response @shlomi-noach.

You are right about documentation, it says:

MySQL Configuration Your MySQL topologies must fulfill some requirements in order to support failovers. Those requirements largely depends on the types of topologies/configuration you use. BinlogServers: promotable servers must have log_bin enabled.

I have log-bin enabled in all my servers and Orchestrator is reading the binlog format as ROW as you can see below:

stg1wpplatmysql04:3306   [0s,ok,5.7.23-23-57-log,rw,ROW]
+ stg1wpplatgarbd02:3306 [0s,ok,5.7.24-27-log,ro,ROW]
+ stg1wpplatmysql03:3306 [0s,ok,5.7.24-27-log,ro,ROW]

I would think that a replication problem or mysql misconfiguration would affect a recovery from a Dead Master event, but that's not the case, Orchestrator is promoting a slave as a Master when the master server dies.

Will recover from DeadMaster on stg1wpplatgarbd02:3306
Recovered from DeadMaster on stg1wpplatgarbd02:3306. Failed: stg1wpplatgarbd02:3306; Promoted: stg1wpplatmysql04:3306
(for all types) Recovered from DeadMaster on stg1wpplatgarbd02:3306. Failed: stg1wpplatgarbd02:3306; Successor: stg1wpplatmysql04:3306

jlevene commented 5 years ago

Just to be extra clear: it promotes a slave to master correctly (if slowly) when a master dies. It just refuses to do a graceful promotion of a slave.

shlomi-noach commented 5 years ago

Thank you. As I see it, the behavior is correct and consistent with documentation. There may be an option to generate a structure analysis that says "orchestrator cannot failover this cluster".

jlevene commented 5 years ago

Thank you Shlomi for your prompt answer. I tried a couple of times, and cannot "get my head around" your answer.

If it is unable to promote a slave when the cluster is using binlog for cluster consistency, that's correct behavior? How could we fix it so our Orchestrator would work like all the demos? If we changed to GTID or Pseudo-GTID, would Orchestrator then be able to control the cluster?

The way it is now, we can't use Orchestrator to take a Master out of rotation to upgrade it. I would bet 100 NIS it's supposed to be able to do that (probably from the GUI).

I'm really hoping you can recommend something that will make our setup work like all the demos, because right now it does about half of the actions correctly. (It also doesn't bring a failed Master back into the cluster [as a slave, or otherwise] after it comes back up.)

Sorry: I realize I asked 3 different Q's in the 2nd paragraph.

shlomi-noach commented 5 years ago

If it is unable to promote a slave when the cluster is using binlog for cluster consistency,

Now it's me who cannot get my head around. "using binlog for cluster consistency" is a sentence that does not make sense. Binlogs are the basic essential mechanism for replication.

The way MySQL implements binlogs is that every server has its own binary logs, with different names, different coordinates. It is a very complicated task to be able to correlate binary logs from different servers.

That's what GIT and Pseudo-GTID are for. They make binlog correlation possible, among other things.

The way it is now, we can't use Orchestrator to take a Master out of rotation to upgrade it. I would bet 100 NIS it's supposed to be able to do that (probably from the GUI).

I'm happy to take your 100NIS. Please contribute them to a good cause and attach the receipt here.

I'm really hoping you can recommend something that will make our setup work like all the demos,

I did. I recommend that you use GTID or Pseudo-GTID.

Pseudo-GTID is super easy. Try this: https://github.com/github/orchestrator/blob/master/docs/configuration-discovery-pseudo-gtid.md#automated-pseudo-gtid-injection

Now, if we kill the master, Orchestrator will eventually (5 minutes) promote a slave and get everything working again.

5 minutes is an astounding time. We're looking at 5-10 seconds till failover kickes in, and total ~15sec for full recovery. Did you consider https://github.com/github/orchestrator/blob/master/docs/configuration-failure-detection.md#mysql-configuration ?

When the former master is brought back up, Orchestrator never brings it back into replication; it has to be made a slave manually.

This is expected behavior. It is also documented, I'm pretty sure. orchestrator does not have the mechanics to run backup/restores on your systems.

Relocating 1 replicas of stg1wpplatmysql04:3306 below stg1wpplatmysql03:3306 turns to be too complex; please do it manually.

Would you mind pasting the existing topology? orchestrator-client -c topology -alias <yourcluster>

shlomi-noach commented 5 years ago

Friendly ping

Would you mind pasting the existing topology? orchestrator-client -c topology -alias

jlevene commented 5 years ago

It works great with Pseudo-GTID. Many thanks!

The topology is just 1 master with 2 slaves replicating from it, which is actually what we'll do in production (we have lots of tiny clusters like that, each serving a few web servers, in self-contained "pods" to better contain attacks, which we get a lot of).

The output is:

stg1wpplatgarbd02:3306   [0s,ok,5.7.24-27-log,rw,ROW,>>,P-GTID]
+ stg1wpplatmysql03:3306 [0s,ok,5.7.24-27-log,ro,ROW,>>,P-GTID]
- stg1wpplatmysql04:3306 [null,nonreplicating,5.7.23-23-57-log,ro,ROW,>>,P-GTID]

I'm closing this now. I was delaying to try to get the receipt, which I'll have to send to Shlomi another way, in a few days.

shlomi-noach commented 5 years ago

I'm surprised to see I never followed up here: in the above https://github.com/github/orchestrator/issues/824#issuecomment-469780512 @jlevene made a bet for something, which turned out to be wrong. @jlevene made good on his bet, and sent me a private message with proof of his donation to a charity organization. Thank you @jlevene!

openark / orchestrator

Failover recovery, graceful takeover don't work w/binlog #824