Open sjmudd opened 6 years ago
Thank you for opening this Issue. While I have no immediate insight into what went wrong:
...One of the slaves was a delayed replica with a configured 24-hour delay.
It sounds as if this should be the smoking gun. However, orchestrator
specifically targets delayed replicas in a special way and relocates them offline:
From the screenshot you posted it seems there were no postponed functions, which means the delayed replica was not run asynchronously.
Which reminds me I added a newer logic to relocate replicas and wanted to retire the older logic (right now configurable by config.Config.PseudoGTIDPreferIndependentMultiMatch
).
Please confirm you're using PseudoGTIDPreferIndependentMultiMatch = false
(or none at all, as I suspect), which means you're using the "older" method. Nothing bad about it, just so that I know where to look.
...could have an extra parameter, topologyRecovery, which can be used for optional audit logging if topologyRecovery != nil? This would allow the same routines to be used for normal relocation purposes without special logging, but under recovery processing extra logging could be emitted. Does this sound like a reasonable idea?
Absolutely, this would be the way to go in temrs of visibility/logging.
hmmm. What's your PostponeReplicaRecoveryOnLagMinutes
or PostponeSlaveRecoveryOnLagMinutes
?
I think I should get rid of this condition and just hard code to "always postpone sql delayed servers".
Thanks for your feedback.
PseudoGTIDPreferIndependentMultiMatch
so I'll adjust the configuration. Is this enabled by default in current code? No, looks like it is not. My suggestion would be to adjust the default giving people some time to see how that affects things? I'll adjust my configuration as suggested."PostponeSlaveRecoveryOnLagMinutes": 120,
So thanks for the pointer regarding the MultiMatch thing. It's hard even for me to follow all the changes you are making.
I had another failure and logging showed:
2018-04-26 12:13:22 | Completed PreFailoverProcesses hook 1 of 1 in 2.910507384s
2018-04-26 12:13:22 | done running PreFailoverProcesses hooks
2018-04-26 12:13:22 | RecoverDeadMaster: will recover old-master.example.com:3306
2018-04-26 12:13:22 | RecoverDeadMaster: masterRecoveryType=MasterRecoveryPseudoGTID
2018-04-26 12:13:22 | RecoverDeadMaster: regrouping replicas via Pseudo-GTID
2018-04-26 12:28:29 | RecoverDeadMaster: 1 postponed functions
2018-04-26 12:28:29 | promoted replica: new-master.example.com:3306
2018-04-26 12:28:29 | checking if should replace promoted replica with a better candidate
2018-04-26 12:28:29 | + checking if promoted replica is the ideal candidate
2018-04-26 12:28:29 | promoted replica new-master.example.com:3306 is the ideal candidate
2018-04-26 12:28:30 | RecoverDeadMaster: successfully promoted new-master.example.com:3306
This was using
$ sudo grep PseudoGTIDPreferIndependentMultiMatch /etc/orchestrator.conf.json
"PseudoGTIDPreferIndependentMultiMatch": true,
Not shown in the logging but there were 40 slaves that needed to be relocated so some time is expected to do that but no confirmation of the details is seen here. So it does look like adding extra logging to find the "blockage" would be good. I clearly need to move this replication chain to use GTID as that will speed up the whole process.
Are you able to relocate-replicas
, at your own good time, a large set of replicas that include a delayed server, and report back how long that took?
Logging is confusing. Below shows the server using GTID and then says it isn't.
gtid_mode
on this server is currently OFF_PERMISSIVE
.
I was testing how long it would take to move a group of slaves with no delayed slave and was then going to test with delayed slaves. However, I note that logging surprises me. It seems to indicate that GTID is configured and then says it isn't.
I know that some of the GTID code has been updated but just in case this may show the same behaviour in current code I'm including it for reference.:
[ Names have been anonymised but you get the idea ]
$ sudo orchestrator --verbose -c relocate-slaves -i server-1015 -d server-1026
2018-05-04 17:09:54 INFO starting orchestrator, version: 3.0.1.8, git commit: 208e53de7aa0f0c0a4f6192942f3c3c0e73e6cec
2018-05-04 17:09:54 INFO Read config: /etc/orchestrator.conf.json
2018-05-04 17:09:54 INFO Connecting to backend: maxConnections: 1000, maxIdleConns: 250
2018-05-04 17:09:54 INFO Will move 34 replicas below server-1026:3306 via GTID <<== wrong
2018-05-04 17:09:54 INFO Will match 34 replicas below server-1026:3306 via Pseudo-GTID, independently
2018-05-04 17:09:55 INFO Will match server-1002:3306 below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1002:3306 cluster:server-2053:3306 message:maintenanceToken: 20549, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO Stopped slave on server-1002:3306, Self::0, Exec:binlog.013031:21754893
2018-05-04 17:09:55 INFO Will match server-1061:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1213:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1107:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1200:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1138:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1154:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1086:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1040:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1018:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1103:3306 below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1138:3306 cluster:server-2053:3306 message:maintenanceToken: 20554, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1061:3306 cluster:server-2053:3306 message:maintenanceToken: 20550, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1213:3306 cluster:server-2053:3306 message:maintenanceToken: 20551, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1107:3306 cluster:server-2053:3306 message:maintenanceToken: 20552, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1160:3306 below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1154:3306 cluster:server-2053:3306 message:maintenanceToken: 20555, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1200:3306 cluster:server-2053:3306 message:maintenanceToken: 20553, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1018:3306 cluster:server-2053:3306 message:maintenanceToken: 20558, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1086:3306 cluster:server-2053:3306 message:maintenanceToken: 20556, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1103:3306 cluster:server-2053:3306 message:maintenanceToken: 20559, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1040:3306 cluster:server-2053:3306 message:maintenanceToken: 20557, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1160:3306 cluster:server-2053:3306 message:maintenanceToken: 20560, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1091:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1019:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1025:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1035:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1145:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1149:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1108:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1067:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1019:3306 cluster:server-2053:3306 message:maintenanceToken: 20562, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1091:3306 cluster:server-2053:3306 message:maintenanceToken: 20561, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1145:3306 cluster:server-2053:3306 message:maintenanceToken: 20565, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1172:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1063:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1071:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1151:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1025:3306 cluster:server-2053:3306 message:maintenanceToken: 20563, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1166:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1108:3306 cluster:server-2053:3306 message:maintenanceToken: 20567, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1035:3306 cluster:server-2053:3306 message:maintenanceToken: 20564, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1149:3306 cluster:server-2053:3306 message:maintenanceToken: 20566, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1172:3306 cluster:server-2053:3306 message:maintenanceToken: 20569, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1028:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1110:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1071:3306 cluster:server-2053:3306 message:maintenanceToken: 20571, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1165:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1063:3306 cluster:server-2053:3306 message:maintenanceToken: 20570, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1151:3306 cluster:server-2053:3306 message:maintenanceToken: 20572, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1067:3306 cluster:server-2053:3306 message:maintenanceToken: 20568, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1110:3306 cluster:server-2053:3306 message:maintenanceToken: 20575, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1166:3306 cluster:server-2053:3306 message:maintenanceToken: 20573, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1165:3306 cluster:server-2053:3306 message:maintenanceToken: 20576, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1028:3306 cluster:server-2053:3306 message:maintenanceToken: 20574, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1101:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1101:3306 cluster:server-2053:3306 message:maintenanceToken: 20577, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1204:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1204:3306 cluster:server-2053:3306 message:maintenanceToken: 20578, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1111:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1111:3306 cluster:server-2053:3306 message:maintenanceToken: 20579, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1162:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1162:3306 cluster:server-2053:3306 message:maintenanceToken: 20580, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1022:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1022:3306 cluster:server-2053:3306 message:maintenanceToken: 20581, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Stopped slave on server-1154:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Stopped slave on server-1061:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Stopped slave on server-1107:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Will match server-1147:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Stopped slave on server-1103:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Stopped slave on server-1040:3306, Self::0, Exec:binlog.013031:21889835
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1147:3306 cluster:server-2053:3306 message:maintenanceToken: 20582, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Stopped slave on server-1200:3306, Self::0, Exec:binlog.013031:21889835
2018-05-04 17:09:56 INFO Stopped slave on server-1018:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Stopped slave on server-1086:3306, Self::0, Exec:binlog.013031:21888376
2018-05-04 17:09:56 INFO Stopped slave on server-1138:3306, Self::0, Exec:binlog.013031:21889835
2018-05-04 17:09:56 INFO Stopped slave on server-1160:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Stopped slave on server-1213:3306, Self::0, Exec:binlog.013031:21889835
2018-05-04 17:09:56 INFO Stopped slave on server-1019:3306, Self::0, Exec:binlog.013031:21893904
2018-05-04 17:09:56 INFO Stopped slave on server-1151:3306, Self::0, Exec:binlog.013031:21894515
2018-05-04 17:09:56 INFO Stopped slave on server-1145:3306, Self::0, Exec:binlog.013031:21894515
2018-05-04 17:09:56 INFO Stopped slave on server-1035:3306, Self::0, Exec:binlog.013031:21894515
2018-05-04 17:09:56 INFO Stopped slave on server-1067:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1108:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1063:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1149:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1071:3306, Self::0, Exec:binlog.013031:21896363
2018-05-04 17:09:56 INFO Stopped slave on server-1101:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1172:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1025:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1204:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1162:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1028:3306, Self::0, Exec:binlog.013031:21896780
2018-05-04 17:09:56 INFO Stopped slave on server-1110:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1091:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1166:3306, Self::0, Exec:binlog.013031:21896780
2018-05-04 17:09:56 INFO Stopped slave on server-1165:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1111:3306, Self::0, Exec:binlog.013031:21902689
2018-05-04 17:09:56 INFO Stopped slave on server-1022:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1147:3306, Self::0, Exec:binlog.013031:21922683
2018-05-04 17:09:58 INFO ChangeMasterTo: Changed master on server-1002:3306 to: server-1026:3306, binlog.012939:7102012. GTID: false
2018-05-04 17:09:59 INFO Started slave on server-1002:3306
2018-05-04 17:09:59 INFO ChangeMasterTo: Changed master on server-1160:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:09:59 INFO auditType:match-below instance:server-1002:3306 cluster:server-2053:3306 message:matched server-1002:3306 below server-1026:3306
2018-05-04 17:09:59 INFO ChangeMasterTo: Changed master on server-1067:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:09:59 INFO auditType:end-maintenance instance:server-1002:3306 cluster:server-2053:3306 message:maintenanceToken: 20549
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1040:3306 to: server-1026:3306, binlog.012939:7236954. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1018:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1107:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1200:3306 to: server-1026:3306, binlog.012939:7236954. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1160:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1151:3306 to: server-1026:3306, binlog.012939:7241634. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1061:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1086:3306 to: server-1026:3306, binlog.012939:7235495. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1022:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1108:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1103:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1067:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1145:3306 to: server-1026:3306, binlog.012939:7241634. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1040:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1138:3306 to: server-1026:3306, binlog.012939:7236954. GTID: false
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1160:3306 cluster:server-2053:3306 message:matched server-1160:3306 below server-1026:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1101:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1160:3306 cluster:server-2053:3306 message:maintenanceToken: 20560
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1154:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1107:3306
2018-05-04 17:10:00 INFO Started slave on server-1018:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1067:3306 cluster:server-2053:3306 message:matched server-1067:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1067:3306 cluster:server-2053:3306 message:maintenanceToken: 20568
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1040:3306 cluster:server-2053:3306 message:matched server-1040:3306 below server-1026:3306
2018-05-04 17:10:00 INFO Started slave on server-1200:3306
2018-05-04 17:10:00 INFO Started slave on server-1151:3306
2018-05-04 17:10:00 INFO Started slave on server-1108:3306
2018-05-04 17:10:00 INFO Started slave on server-1022:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1040:3306 cluster:server-2053:3306 message:maintenanceToken: 20557
2018-05-04 17:10:00 INFO Started slave on server-1061:3306
2018-05-04 17:10:00 INFO Started slave on server-1103:3306
2018-05-04 17:10:00 INFO Started slave on server-1086:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1107:3306 cluster:server-2053:3306 message:matched server-1107:3306 below server-1026:3306
2018-05-04 17:10:00 INFO Started slave on server-1145:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1018:3306 cluster:server-2053:3306 message:matched server-1018:3306 below server-1026:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1025:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1107:3306 cluster:server-2053:3306 message:maintenanceToken: 20552
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1063:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1018:3306 cluster:server-2053:3306 message:maintenanceToken: 20558
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1110:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1165:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1162:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1019:3306 to: server-1026:3306, binlog.012939:7241023. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1154:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1166:3306 to: server-1026:3306, binlog.012939:7243899. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1028:3306 to: server-1026:3306, binlog.012939:7243899. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1138:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1151:3306 cluster:server-2053:3306 message:matched server-1151:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1151:3306 cluster:server-2053:3306 message:maintenanceToken: 20572
2018-05-04 17:10:00 INFO Started slave on server-1101:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1200:3306 cluster:server-2053:3306 message:matched server-1200:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1061:3306 cluster:server-2053:3306 message:matched server-1061:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1086:3306 cluster:server-2053:3306 message:matched server-1086:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1022:3306 cluster:server-2053:3306 message:matched server-1022:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1103:3306 cluster:server-2053:3306 message:matched server-1103:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1200:3306 cluster:server-2053:3306 message:maintenanceToken: 20553
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1108:3306 cluster:server-2053:3306 message:matched server-1108:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1086:3306 cluster:server-2053:3306 message:maintenanceToken: 20556
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1022:3306 cluster:server-2053:3306 message:maintenanceToken: 20581
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1061:3306 cluster:server-2053:3306 message:maintenanceToken: 20550
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1108:3306 cluster:server-2053:3306 message:maintenanceToken: 20567
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1103:3306 cluster:server-2053:3306 message:maintenanceToken: 20559
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1145:3306 cluster:server-2053:3306 message:matched server-1145:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1145:3306 cluster:server-2053:3306 message:maintenanceToken: 20565
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1154:3306 cluster:server-2053:3306 message:matched server-1154:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1154:3306 cluster:server-2053:3306 message:maintenanceToken: 20555
2018-05-04 17:10:00 INFO Started slave on server-1019:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1101:3306 cluster:server-2053:3306 message:matched server-1101:3306 below server-1026:3306
2018-05-04 17:10:00 INFO Started slave on server-1166:3306
2018-05-04 17:10:00 INFO Started slave on server-1025:3306
2018-05-04 17:10:00 INFO Started slave on server-1165:3306
2018-05-04 17:10:00 INFO Started slave on server-1162:3306
2018-05-04 17:10:00 INFO Started slave on server-1110:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1138:3306 cluster:server-2053:3306 message:matched server-1138:3306 below server-1026:3306
2018-05-04 17:10:00 INFO Started slave on server-1028:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1035:3306 to: server-1026:3306, binlog.012939:7241634. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1063:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1101:3306 cluster:server-2053:3306 message:maintenanceToken: 20577
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1138:3306 cluster:server-2053:3306 message:maintenanceToken: 20554
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1213:3306 to: server-1026:3306, binlog.012939:7236954. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1091:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1149:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1172:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1019:3306 cluster:server-2053:3306 message:matched server-1019:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1019:3306 cluster:server-2053:3306 message:maintenanceToken: 20562
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1162:3306 cluster:server-2053:3306 message:matched server-1162:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1166:3306 cluster:server-2053:3306 message:matched server-1166:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1166:3306 cluster:server-2053:3306 message:maintenanceToken: 20573
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1162:3306 cluster:server-2053:3306 message:maintenanceToken: 20580
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1110:3306 cluster:server-2053:3306 message:matched server-1110:3306 below server-1026:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1204:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1110:3306 cluster:server-2053:3306 message:maintenanceToken: 20575
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1165:3306 cluster:server-2053:3306 message:matched server-1165:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1165:3306 cluster:server-2053:3306 message:maintenanceToken: 20576
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1028:3306 cluster:server-2053:3306 message:matched server-1028:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1028:3306 cluster:server-2053:3306 message:maintenanceToken: 20574
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1025:3306 cluster:server-2053:3306 message:matched server-1025:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1025:3306 cluster:server-2053:3306 message:maintenanceToken: 20563
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1063:3306 cluster:server-2053:3306 message:matched server-1063:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1063:3306 cluster:server-2053:3306 message:maintenanceToken: 20570
2018-05-04 17:10:00 INFO Started slave on server-1035:3306
2018-05-04 17:10:00 INFO Started slave on server-1149:3306
2018-05-04 17:10:00 INFO Started slave on server-1172:3306
2018-05-04 17:10:00 INFO Started slave on server-1091:3306
2018-05-04 17:10:00 INFO Started slave on server-1213:3306
2018-05-04 17:10:00 INFO Started slave on server-1204:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1035:3306 cluster:server-2053:3306 message:matched server-1035:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1035:3306 cluster:server-2053:3306 message:maintenanceToken: 20564
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1149:3306 cluster:server-2053:3306 message:matched server-1149:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1149:3306 cluster:server-2053:3306 message:maintenanceToken: 20566
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1172:3306 cluster:server-2053:3306 message:matched server-1172:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1172:3306 cluster:server-2053:3306 message:maintenanceToken: 20569
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1204:3306 cluster:server-2053:3306 message:matched server-1204:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1091:3306 cluster:server-2053:3306 message:matched server-1091:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1204:3306 cluster:server-2053:3306 message:maintenanceToken: 20578
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1091:3306 cluster:server-2053:3306 message:maintenanceToken: 20561
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1213:3306 cluster:server-2053:3306 message:matched server-1213:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1213:3306 cluster:server-2053:3306 message:maintenanceToken: 20551
2018-05-04 17:10:01 INFO ChangeMasterTo: Changed master on server-1111:3306 to: server-1026:3306, binlog.012939:7249808. GTID: false
2018-05-04 17:10:01 INFO ChangeMasterTo: Changed master on server-1071:3306 to: server-1026:3306, binlog.012939:7243482. GTID: false
2018-05-04 17:10:01 INFO Started slave on server-1111:3306
2018-05-04 17:10:01 INFO auditType:match-below instance:server-1111:3306 cluster:server-2053:3306 message:matched server-1111:3306 below server-1026:3306
2018-05-04 17:10:01 INFO auditType:end-maintenance instance:server-1111:3306 cluster:server-2053:3306 message:maintenanceToken: 20579
2018-05-04 17:10:01 INFO Started slave on server-1071:3306
2018-05-04 17:10:01 INFO auditType:match-below instance:server-1071:3306 cluster:server-2053:3306 message:matched server-1071:3306 below server-1026:3306
2018-05-04 17:10:01 INFO auditType:end-maintenance instance:server-1071:3306 cluster:server-2053:3306 message:maintenanceToken: 20571
2018-05-04 17:10:01 INFO ChangeMasterTo: Changed master on server-1147:3306 to: server-1026:3306, binlog.012939:7269802. GTID: false
2018-05-04 17:10:01 INFO Started slave on server-1147:3306
2018-05-04 17:10:01 INFO auditType:match-below instance:server-1147:3306 cluster:server-2053:3306 message:matched server-1147:3306 below server-1026:3306
2018-05-04 17:10:01 INFO auditType:end-maintenance instance:server-1147:3306 cluster:server-2053:3306 message:maintenanceToken: 20582
2018-05-04 17:10:01 INFO auditType:multi-match-below-independent instance:server-1026:3306 cluster:server-2053:3306 message:matched 34/34 replicas below server-1026:3306 via Pseudo-GTID
2018-05-04 17:10:01 INFO auditType:relocate-replicas instance:server-1015:3306 cluster:server-2053:3306 message:relocated 34 replicas of server-1015:3306 below server-1026:3306
server-1002:3306
server-1160:3306
server-1067:3306
server-1040:3306
server-1107:3306
server-1018:3306
server-1151:3306
server-1200:3306
server-1086:3306
server-1022:3306
server-1061:3306
server-1108:3306
server-1103:3306
server-1145:3306
server-1154:3306
server-1101:3306
server-1138:3306
server-1019:3306
server-1166:3306
server-1162:3306
server-1110:3306
server-1165:3306
server-1028:3306
server-1025:3306
server-1063:3306
server-1035:3306
server-1149:3306
server-1172:3306
server-1204:3306
server-1091:3306
server-1213:3306
server-1111:3306
server-1071:3306
server-1147:3306
$
So I see on line 6 a switch from using GTID to moving to pseudo-GTID.
gtid_mode = OFF_PERMISSIVE
.So time to relocate 34 slaves is about 7 seconds.
From memory just relocating a delayed slave (24h) took some minutes and has often failed. However, as usual trying it now it succeeds (moving a single server) much more quickly:
$ time sudo orchestrator --verbose -c relocate -i delay-2002 -d server-2050
2018-05-04 17:53:18 INFO starting orchestrator, version: 3.0.1.8, git commit: 208e53de7aa0f0c0a4f6192942f3c3c0e73e6cec
2018-05-04 17:53:18 INFO Read config: /etc/orchestrator.conf.json
2018-05-04 17:53:18 INFO Connecting to backend: maxConnections: 1000, maxIdleConns: 250
2018-05-04 17:53:18 INFO Will match delay-2002:3306 below server-2050:3306
2018-05-04 17:53:18 INFO auditType:begin-maintenance instance:delay-2002:3306 cluster:server-2053:3306 message:maintenanceToken: 20621, owner: root, reason: match below server-2050:3306
2018-05-04 17:53:18 INFO Stopped slave on delay-2002:3306, Self::0, Exec:binlog.011186:80995296
2018-05-04 17:53:41 INFO ChangeMasterTo: Changed master on delay-2002:3306 to: server-2050:3306, binlog.011299:58782124. GTID: false
2018-05-04 17:53:41 INFO Started slave on delay-2002:3306
2018-05-04 17:53:41 INFO auditType:match-below instance:delay-2002:3306 cluster:server-2053:3306 message:matched delay-2002:3306 below server-2050:3306
2018-05-04 17:53:41 INFO auditType:end-maintenance instance:delay-2002:3306 cluster:server-2053:3306 message:maintenanceToken: 20621
2018-05-04 17:53:41 INFO auditType:relocate-below instance:delay-2002:3306 cluster:server-2053:3306 message:relocated delay-2002:3306 below server-2050:3306
delay-2002:3306<server-2050:3306
real 0m22.980s
user 0m13.818s
sys 0m1.428s
$ time sudo orchestrator --verbose -c relocate -i delay-2002 -d server-2057
2018-05-04 17:56:44 INFO starting orchestrator, version: 3.0.1.8, git commit: 208e53de7aa0f0c0a4f6192942f3c3c0e73e6cec
2018-05-04 17:56:44 INFO Read config: /etc/orchestrator.conf.json
2018-05-04 17:56:44 INFO Connecting to backend: maxConnections: 1000, maxIdleConns: 250
2018-05-04 17:56:44 INFO Will match delay-2002:3306 below server-2057:3306
2018-05-04 17:56:44 INFO auditType:begin-maintenance instance:delay-2002:3306 cluster:server-2053:3306 message:maintenanceToken: 20622, owner: root, reason: match below server-2057:3306
2018-05-04 17:56:44 INFO Stopped slave on delay-2002:3306, Self::0, Exec:binlog.011300:63790012
2018-05-04 17:57:08 INFO ChangeMasterTo: Changed master on delay-2002:3306 to: server-2057:3306, binlog.011187:85842087. GTID: false
2018-05-04 17:57:08 INFO Started slave on delay-2002:3306
2018-05-04 17:57:08 INFO auditType:match-below instance:delay-2002:3306 cluster:server-2053:3306 message:matched delay-2002:3306 below server-2057:3306
2018-05-04 17:57:08 INFO auditType:end-maintenance instance:delay-2002:3306 cluster:server-2053:3306 message:maintenanceToken: 20622
2018-05-04 17:57:08 INFO auditType:relocate-below instance:delay-2002:3306 cluster:server-2053:3306 message:relocated delay-2002:3306 below server-2057:3306
delay-2002:3306<server-2057:3306
real 0m24.296s
user 0m13.899s
sys 0m1.547s
Taking 23 and 24 seconds respectively.
And of course testing this with nothing broken shows it works fine ....
$ sudo orchestrator --verbose -c relocate-slaves -i server-1070 -d server-1053
2018-05-04 18:06:53 INFO starting orchestrator, version: 3.0.1.8, git commit: 208e53de7aa0f0c0a4f6192942f3c3c0e73e6cec
2018-05-04 18:06:53 INFO Read config: /etc/orchestrator.conf.json
2018-05-04 18:06:53 INFO Connecting to backend: maxConnections: 1000, maxIdleConns: 250
2018-05-04 18:06:53 INFO Will move 27 replicas below server-1053:3306 via GTID
2018-05-04 18:06:53 INFO Will match 27 replicas below server-1053:3306 via Pseudo-GTID, independently
2018-05-04 18:06:54 INFO Will match server-1217:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1217:3306 cluster:server-2053:3306 message:maintenanceToken: 20623, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1169:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1169:3306 cluster:server-2053:3306 message:maintenanceToken: 20624, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1187:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1187:3306 cluster:server-2053:3306 message:maintenanceToken: 20625, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1207:3306 below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1041:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1207:3306 cluster:server-2053:3306 message:maintenanceToken: 20626, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1041:3306 cluster:server-2053:3306 message:maintenanceToken: 20627, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1117:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1117:3306 cluster:server-2053:3306 message:maintenanceToken: 20628, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1198:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1198:3306 cluster:server-2053:3306 message:maintenanceToken: 20629, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Stopped slave on server-1217:3306, Self::0, Exec:binlog.012579:22476581
2018-05-04 18:06:54 INFO Stopped slave on server-1169:3306, Self::0, Exec:binlog.012579:22492743
2018-05-04 18:06:54 INFO Stopped slave on server-1187:3306, Self::0, Exec:binlog.012579:22498180
2018-05-04 18:06:54 INFO Will match server-1153:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1153:3306 cluster:server-2053:3306 message:maintenanceToken: 20630, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match delay-1002:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:delay-1002:3306 cluster:server-2053:3306 message:maintenanceToken: 20631, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Stopped slave on server-1207:3306, Self::0, Exec:binlog.012579:22516872
2018-05-04 18:06:54 INFO Will match server-1065:3306 below server-1053:3306
2018-05-04 18:06:54 INFO Stopped slave on server-1117:3306, Self::0, Exec:binlog.012579:22529134
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1065:3306 cluster:server-2053:3306 message:maintenanceToken: 20632, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1210:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1210:3306 cluster:server-2053:3306 message:maintenanceToken: 20633, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Stopped slave on server-1041:3306, Self::0, Exec:binlog.012579:22529134
2018-05-04 18:06:54 INFO Will match server-1066:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1066:3306 cluster:server-2053:3306 message:maintenanceToken: 20634, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1050:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1050:3306 cluster:server-2053:3306 message:maintenanceToken: 20635, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Stopped slave on server-1198:3306, Self::0, Exec:binlog.012579:22542113
2018-05-04 18:06:54 INFO Will match server-1021:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1021:3306 cluster:server-2053:3306 message:maintenanceToken: 20636, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1148:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1148:3306 cluster:server-2053:3306 message:maintenanceToken: 20637, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1153:3306, Self::0, Exec:binlog.012579:22560807
2018-05-04 18:06:55 INFO Will match server-1017:3306 below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1102:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1017:3306 cluster:server-2053:3306 message:maintenanceToken: 20638, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1102:3306 cluster:server-2053:3306 message:maintenanceToken: 20639, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on delay-1002:3306, Self::0, Exec:binlog.012259:63192257
2018-05-04 18:06:55 INFO Stopped slave on server-1065:3306, Self::0, Exec:binlog.012579:22573344
2018-05-04 18:06:55 INFO Stopped slave on server-1066:3306, Self::0, Exec:binlog.012579:22580426
2018-05-04 18:06:55 INFO Will match server-1020:3306 below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1164:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1020:3306 cluster:server-2053:3306 message:maintenanceToken: 20640, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1164:3306 cluster:server-2053:3306 message:maintenanceToken: 20641, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1211:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1211:3306 cluster:server-2053:3306 message:maintenanceToken: 20642, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1181:3306 below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1210:3306, Self::0, Exec:binlog.012579:22590570
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1181:3306 cluster:server-2053:3306 message:maintenanceToken: 20643, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1134:3306 below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1133:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1134:3306 cluster:server-2053:3306 message:maintenanceToken: 20644, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1133:3306 cluster:server-2053:3306 message:maintenanceToken: 20645, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1050:3306, Self::0, Exec:binlog.012579:22599282
2018-05-04 18:06:55 INFO Will match server-1079:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1079:3306 cluster:server-2053:3306 message:maintenanceToken: 20646, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1021:3306, Self::0, Exec:binlog.012579:22625656
2018-05-04 18:06:55 INFO Stopped slave on server-1102:3306, Self::0, Exec:binlog.012579:22633478
2018-05-04 18:06:55 INFO Will match server-1121:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1121:3306 cluster:server-2053:3306 message:maintenanceToken: 20647, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1148:3306, Self::0, Exec:binlog.012579:22633478
2018-05-04 18:06:55 INFO Stopped slave on server-1017:3306, Self::0, Exec:binlog.012579:22642053
2018-05-04 18:06:55 INFO Stopped slave on server-1164:3306, Self::0, Exec:binlog.012579:22646216
2018-05-04 18:06:55 INFO Stopped slave on server-1181:3306, Self::0, Exec:binlog.012579:22679198
2018-05-04 18:06:55 INFO Will match server-1155:3306 below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1020:3306, Self::0, Exec:binlog.012579:22646216
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1155:3306 cluster:server-2053:3306 message:maintenanceToken: 20648, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1211:3306, Self::0, Exec:binlog.012579:22646216
2018-05-04 18:06:55 INFO Stopped slave on server-1134:3306, Self::0, Exec:binlog.012579:22711095
2018-05-04 18:06:55 INFO Stopped slave on server-1079:3306, Self::0, Exec:binlog.012579:22716263
2018-05-04 18:06:55 INFO Stopped slave on server-1133:3306, Self::0, Exec:binlog.012579:22714366
2018-05-04 18:06:55 INFO Stopped slave on server-1121:3306, Self::0, Exec:binlog.012579:22719073
2018-05-04 18:06:55 INFO Stopped slave on server-1155:3306, Self::0, Exec:binlog.012579:22817863
2018-05-04 18:06:55 INFO Will match server-1208:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1208:3306 cluster:server-2053:3306 message:maintenanceToken: 20649, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:56 INFO Stopped slave on server-1208:3306, Self::0, Exec:binlog.012579:23307973
2018-05-04 18:06:56 INFO ChangeMasterTo: Changed master on server-1169:3306 to: server-1053:3306, binlog.013040:58059824. GTID: false
2018-05-04 18:06:56 INFO ChangeMasterTo: Changed master on server-1117:3306 to: server-1053:3306, binlog.013040:58096215. GTID: false
2018-05-04 18:06:56 INFO Started slave on server-1169:3306
2018-05-04 18:06:56 INFO auditType:match-below instance:server-1169:3306 cluster:server-2053:3306 message:matched server-1169:3306 below server-1053:3306
2018-05-04 18:06:56 INFO auditType:end-maintenance instance:server-1169:3306 cluster:server-2053:3306 message:maintenanceToken: 20624
2018-05-04 18:06:56 INFO Started slave on server-1117:3306
2018-05-04 18:06:56 INFO ChangeMasterTo: Changed master on server-1155:3306 to: server-1053:3306, binlog.013040:58384944. GTID: false
2018-05-04 18:06:56 INFO auditType:match-below instance:server-1117:3306 cluster:server-2053:3306 message:matched server-1117:3306 below server-1053:3306
2018-05-04 18:06:56 INFO auditType:end-maintenance instance:server-1117:3306 cluster:server-2053:3306 message:maintenanceToken: 20628
2018-05-04 18:06:56 INFO Started slave on server-1155:3306
2018-05-04 18:06:56 INFO auditType:match-below instance:server-1155:3306 cluster:server-2053:3306 message:matched server-1155:3306 below server-1053:3306
2018-05-04 18:06:56 INFO auditType:end-maintenance instance:server-1155:3306 cluster:server-2053:3306 message:maintenanceToken: 20648
2018-05-04 18:06:57 INFO ChangeMasterTo: Changed master on server-1187:3306 to: server-1053:3306, binlog.013040:58065261. GTID: false
2018-05-04 18:06:57 INFO ChangeMasterTo: Changed master on server-1208:3306 to: server-1053:3306, binlog.013040:58875054. GTID: false
2018-05-04 18:06:57 INFO Started slave on server-1187:3306
2018-05-04 18:06:57 INFO Started slave on server-1208:3306
2018-05-04 18:06:57 INFO auditType:match-below instance:server-1187:3306 cluster:server-2053:3306 message:matched server-1187:3306 below server-1053:3306
2018-05-04 18:06:57 INFO auditType:end-maintenance instance:server-1187:3306 cluster:server-2053:3306 message:maintenanceToken: 20625
2018-05-04 18:06:57 INFO auditType:match-below instance:server-1208:3306 cluster:server-2053:3306 message:matched server-1208:3306 below server-1053:3306
2018-05-04 18:06:57 INFO auditType:end-maintenance instance:server-1208:3306 cluster:server-2053:3306 message:maintenanceToken: 20649
2018-05-04 18:06:58 INFO ChangeMasterTo: Changed master on server-1041:3306 to: server-1053:3306, binlog.013040:58096215. GTID: false
2018-05-04 18:06:58 INFO Started slave on server-1041:3306
2018-05-04 18:06:58 INFO ChangeMasterTo: Changed master on server-1217:3306 to: server-1053:3306, binlog.013040:58043662. GTID: false
2018-05-04 18:06:58 INFO auditType:match-below instance:server-1041:3306 cluster:server-2053:3306 message:matched server-1041:3306 below server-1053:3306
2018-05-04 18:06:58 INFO auditType:end-maintenance instance:server-1041:3306 cluster:server-2053:3306 message:maintenanceToken: 20627
2018-05-04 18:06:58 INFO Started slave on server-1217:3306
2018-05-04 18:06:58 INFO ChangeMasterTo: Changed master on server-1207:3306 to: server-1053:3306, binlog.013040:58083953. GTID: false
2018-05-04 18:06:58 INFO auditType:match-below instance:server-1217:3306 cluster:server-2053:3306 message:matched server-1217:3306 below server-1053:3306
2018-05-04 18:06:58 INFO auditType:end-maintenance instance:server-1217:3306 cluster:server-2053:3306 message:maintenanceToken: 20623
2018-05-04 18:06:58 INFO ChangeMasterTo: Changed master on server-1065:3306 to: server-1053:3306, binlog.013040:58140425. GTID: false
2018-05-04 18:06:58 INFO Started slave on server-1207:3306
2018-05-04 18:06:59 INFO Started slave on server-1065:3306
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1207:3306 cluster:server-2053:3306 message:matched server-1207:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1207:3306 cluster:server-2053:3306 message:maintenanceToken: 20626
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1065:3306 cluster:server-2053:3306 message:matched server-1065:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1065:3306 cluster:server-2053:3306 message:maintenanceToken: 20632
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1134:3306 to: server-1053:3306, binlog.013040:58278176. GTID: false
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1050:3306 to: server-1053:3306, binlog.013040:58166363. GTID: false
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1211:3306 to: server-1053:3306, binlog.013040:58213297. GTID: false
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1102:3306 to: server-1053:3306, binlog.013040:58200559. GTID: false
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1198:3306 to: server-1053:3306, binlog.013040:58109194. GTID: false
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1133:3306 to: server-1053:3306, binlog.013040:58281447. GTID: false
2018-05-04 18:06:59 INFO Started slave on server-1134:3306
2018-05-04 18:06:59 INFO Started slave on server-1050:3306
2018-05-04 18:06:59 INFO Started slave on server-1211:3306
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1050:3306 cluster:server-2053:3306 message:matched server-1050:3306 below server-1053:3306
2018-05-04 18:06:59 INFO Started slave on server-1102:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1050:3306 cluster:server-2053:3306 message:maintenanceToken: 20635
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1134:3306 cluster:server-2053:3306 message:matched server-1134:3306 below server-1053:3306
2018-05-04 18:06:59 INFO Started slave on server-1133:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1134:3306 cluster:server-2053:3306 message:maintenanceToken: 20644
2018-05-04 18:06:59 INFO Started slave on server-1198:3306
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1211:3306 cluster:server-2053:3306 message:matched server-1211:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1102:3306 cluster:server-2053:3306 message:matched server-1102:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1211:3306 cluster:server-2053:3306 message:maintenanceToken: 20642
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1102:3306 cluster:server-2053:3306 message:maintenanceToken: 20639
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1133:3306 cluster:server-2053:3306 message:matched server-1133:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1133:3306 cluster:server-2053:3306 message:maintenanceToken: 20645
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1198:3306 cluster:server-2053:3306 message:matched server-1198:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1198:3306 cluster:server-2053:3306 message:maintenanceToken: 20629
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1121:3306 to: server-1053:3306, binlog.013040:58286154. GTID: false
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1079:3306 to: server-1053:3306, binlog.013040:58283344. GTID: false
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1148:3306 to: server-1053:3306, binlog.013040:58200559. GTID: false
2018-05-04 18:07:00 INFO Started slave on server-1121:3306
2018-05-04 18:07:00 INFO Started slave on server-1079:3306
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1121:3306 cluster:server-2053:3306 message:matched server-1121:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1121:3306 cluster:server-2053:3306 message:maintenanceToken: 20647
2018-05-04 18:07:00 INFO Started slave on server-1148:3306
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1153:3306 to: server-1053:3306, binlog.013040:58127888. GTID: false
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1210:3306 to: server-1053:3306, binlog.013040:58157651. GTID: false
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1079:3306 cluster:server-2053:3306 message:matched server-1079:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1079:3306 cluster:server-2053:3306 message:maintenanceToken: 20646
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1017:3306 to: server-1053:3306, binlog.013040:58209134. GTID: false
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1020:3306 to: server-1053:3306, binlog.013040:58213297. GTID: false
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1148:3306 cluster:server-2053:3306 message:matched server-1148:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1148:3306 cluster:server-2053:3306 message:maintenanceToken: 20637
2018-05-04 18:07:00 INFO Started slave on server-1210:3306
2018-05-04 18:07:00 INFO Started slave on server-1153:3306
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1164:3306 to: server-1053:3306, binlog.013040:58213297. GTID: false
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1021:3306 to: server-1053:3306, binlog.013040:58192737. GTID: false
2018-05-04 18:07:00 INFO Started slave on server-1020:3306
2018-05-04 18:07:00 INFO Started slave on server-1017:3306
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1066:3306 to: server-1053:3306, binlog.013040:58147507. GTID: false
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1153:3306 cluster:server-2053:3306 message:matched server-1153:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1017:3306 cluster:server-2053:3306 message:matched server-1017:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1153:3306 cluster:server-2053:3306 message:maintenanceToken: 20630
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1017:3306 cluster:server-2053:3306 message:maintenanceToken: 20638
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1210:3306 cluster:server-2053:3306 message:matched server-1210:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1210:3306 cluster:server-2053:3306 message:maintenanceToken: 20633
2018-05-04 18:07:00 INFO Started slave on server-1164:3306
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1020:3306 cluster:server-2053:3306 message:matched server-1020:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1020:3306 cluster:server-2053:3306 message:maintenanceToken: 20640
2018-05-04 18:07:00 INFO Started slave on server-1021:3306
2018-05-04 18:07:00 INFO Started slave on server-1066:3306
2018-05-04 18:07:01 INFO auditType:match-below instance:server-1164:3306 cluster:server-2053:3306 message:matched server-1164:3306 below server-1053:3306
2018-05-04 18:07:01 INFO auditType:end-maintenance instance:server-1164:3306 cluster:server-2053:3306 message:maintenanceToken: 20641
2018-05-04 18:07:01 INFO auditType:match-below instance:server-1021:3306 cluster:server-2053:3306 message:matched server-1021:3306 below server-1053:3306
2018-05-04 18:07:01 INFO auditType:end-maintenance instance:server-1021:3306 cluster:server-2053:3306 message:maintenanceToken: 20636
2018-05-04 18:07:01 INFO auditType:match-below instance:server-1066:3306 cluster:server-2053:3306 message:matched server-1066:3306 below server-1053:3306
2018-05-04 18:07:01 INFO auditType:end-maintenance instance:server-1066:3306 cluster:server-2053:3306 message:maintenanceToken: 20634
2018-05-04 18:07:01 INFO ChangeMasterTo: Changed master on server-1181:3306 to: server-1053:3306, binlog.013040:58246279. GTID: false
2018-05-04 18:07:01 INFO Started slave on server-1181:3306
2018-05-04 18:07:01 INFO auditType:match-below instance:server-1181:3306 cluster:server-2053:3306 message:matched server-1181:3306 below server-1053:3306
2018-05-04 18:07:01 INFO auditType:end-maintenance instance:server-1181:3306 cluster:server-2053:3306 message:maintenanceToken: 20643
2018-05-04 18:07:41 INFO ChangeMasterTo: Changed master on delay-1002:3306 to: server-1053:3306, binlog.012719:75090003. GTID: false
2018-05-04 18:07:41 INFO Started slave on delay-1002:3306
2018-05-04 18:07:41 INFO auditType:match-below instance:delay-1002:3306 cluster:server-2053:3306 message:matched delay-1002:3306 below server-1053:3306
2018-05-04 18:07:41 INFO auditType:end-maintenance instance:delay-1002:3306 cluster:server-2053:3306 message:maintenanceToken: 20631
2018-05-04 18:07:41 INFO auditType:multi-match-below-independent instance:server-1053:3306 cluster:server-2053:3306 message:matched 27/27 replicas below server-1053:3306 via Pseudo-GTID
2018-05-04 18:07:41 INFO auditType:relocate-replicas instance:server-1070:3306 cluster:server-2053:3306 message:relocated 27 replicas of server-1070:3306 below server-1053:3306
server-1169:3306
server-1117:3306
server-1155:3306
server-1187:3306
server-1208:3306
server-1041:3306
server-1217:3306
server-1207:3306
server-1065:3306
server-1050:3306
server-1134:3306
server-1211:3306
server-1102:3306
server-1133:3306
server-1198:3306
server-1121:3306
server-1079:3306
server-1148:3306
server-1153:3306
server-1017:3306
server-1210:3306
server-1020:3306
server-1164:3306
server-1021:3306
server-1066:3306
server-1181:3306
delay-1002:3306
$
So 48 seconds.
Thank you for testing!
GTID
issues. Definitely on my roadmap to significantly improve support.@sjmudd thank you for posting this issue. I'd like to resume the conversation regarding your proposed feature requests with a contribution I've got a green light to implement: configurable log output formatting.
We are currently having the same pain you've described in the original message, namely:
Since we use Splunk to search logs, we'd prefer to have the data formatted as JSON, but can't right now. Do you or @shlomi-noach have any objections to a PR for this feature? Any guidance on implementation details I should be mindful of? Thanks!
Please be advised that v3.0.12
, https://github.com/github/orchestrator/releases/tag/v3.0.12, released just yesterday, includes much purging of logs, as per #555 .
@shlomi-noach thanks for the quick response and releasing a new version with that PR. After looking more closely at those changes, they do not include the things we're filtering for. For example, when we want to isolate for actions taken during RecoverDeadCoMaster
here are the filters we apply:
log != "*WARNING discoverInstance*"
log != "*ERROR ReadTopologyInstance*"
log != "*WARNING: NamedStopwatch.Stop*"
log != "*ERROR dial tcp*" log != "*martini*"
log != "*http: proxy error: context canceled*"
log != "*INFO auditType:register-candidate*"
log != "*ERROR FlushNontrivialResolveCacheToDatabase()*"
log != "*waiting on a discoveryQueue*"
log != "*instance.SlaveLagSeconds < 0*"
log != "*INFO auditType*"
log != "*Discovered host*"
log != "*outdated keys*"
log != "*invalid connection*"
log != "*unexpected EOF*"
log != "*i/o timeout*"
log != "*resolve*"
To be clear, these filters may not all be needed for this particular use case, but these are "go to" filters when trying to eliminate noise. Our thinking is that if we were to use JSON logging output, we could then add metadata about the error that would make filtering more reasonable.
Thoughts?
Noting that most actions taken during recovery are especially audited, not just in the standard log, but in recovery log. This is found in /api/topology-recovery-steps/...
(or though the web interface).
These have been refined and still are, and my feeling is that they provide an overall good picture of the recovery process. These logs do not include lower-level actions such as change master to
etc.
I'm hesitant to switch logging into JSON format, unless you propose a global variable that converts all logs to JSON, which is a reasonable approach.
These logs do not include lower-level actions such as change master to etc.
We feel at this point in our Orchestrator integration, we need these lower-level logs. That's what's motivating us.
you propose a global variable that converts all logs to JSON, which is a reasonable approach
My intention would be to make log formatting configurable, yes. Nothing would need to change for existing users, or the defaults.
Does that sound OK in principal? Any thoughts on implementation details that come to mind that you care about?
Any thoughts on implementation details that come to mind that you care about?
As I progressed I found that https://github.com/github/orchestrator/blob/master/vendor/github.com/openark/golib/log/log.go is not 100% compatible with the standard library logger (e.g. Errore), thus making any kind of drop-in replacement not immediately possible.
How would you feel about a PR that eliminates the use of these non-standard logging functions?
I've made a naive first attempt at how something like this might be accomplished: https://github.com/github/orchestrator/compare/4f6ea856ce43abcd3951c22a8f2459cc86c5db63...bbuchalter:extract-non-standard-log-functions
See commit message for details and discussion. Thanks for your engagement!
Could you please elaborate on the log
part? How does the logger you suggest in your above commit differ from the openark
logger, and what is the significance of the change?
@shlomi-noach forgive me for not being more explicit about the benefit of the change. I expect there are a lot of demands your time. Thanks for the quick responses!
The key feature of that library that's relevant to our discussion so far is mentioned here: https://github.com/sirupsen/logrus#formatters. In short, it provides many more options (out-of-the-box and custom) for formatting logs including JSON, fluentd, and logstash.
There are other benefits, as described in the README:
io.Writer
s (e.g. STDOUT
or a file instead of STDERR
.)Does this make clear the differences and my motivation? What other questions do you have?
I'll try to preemptively answer one of my own questions which I raised in my commit message: how can we keep the Criticale
(1 use), Errore
(39 uses), and Fatale
(6 uses) functions provided by https://github.com/openark/golib/blob/master/log/log.go while also switching to logrus?
Here's my reading of the implementation's features and how we would map them to logrus:
openark-logger feature | logrus support | work required |
---|---|---|
log levels | https://github.com/sirupsen/logrus#level-logging | Logrus doesn't support NOTICE or CRITICAL . Remap NOTICE messages to INFO . Remap CRITICAL messages to ERROR . |
Include a stack trace if a flag is set | Not present | Pass the stacktrace in as a field associated with the log message. |
Log formatting | https://github.com/sirupsen/logrus#formatters | Would need to decide if the text formatter provided by logrus would be too big of a change. Logrus's text formatter has a few nice features like color coding when printing to TTY, and logfmt compatibility otherwise. If too big of a change, then we could write a custom formatter for backwards compatibility. |
syslog integration | Available as an optional hook | Not sure how, but it seems like the pieces are in place to continue support. |
Thanks again for your time. Hope this RFC produces a path towards implementation that we're both excited about!
@shlomi-noach wanted to ping this issue because I want to avoid wasted effort if you don't agree with my thinking above. Thanks for your time!
@bbuchalter your input is much appreciated! I'm sorry for the long delay. Could we, as first step, rewrite the openark/log.go
code (inplace is fine, I'll extract that later) to use logerror
or whichever implementation you see fit, such that there are no code changes under the go/
library?
Thanks for the reply @shlomi-noach. Let me see if I understand correctly what you're proposing.
Instead of making any changes to the go/
library, we'll continue to use openark/log.go
to act as the interface, but change it's underlying implementation such that it can be powered (and configured) with the more flexible logrus library I discussed above.
Did I understand you correctly?
@bbuchalter that is correct, if possible, please.
Background
Partial Screenshot from
/web/audit-recovery-steps/
A recent failover took 8 minutes to run, which is "quite a long time". 32 slaves were involved. One of the slaves was a delayed replica with a configured 24-hour delay. Orchestrator was configured on this chain to use pseudo-gtid.
FR: Improve audit logging to provide more detail
During those 8 minutes there was no visible signs of what was going on.
/web/audit-recovery-steps/
output to provide more detail than currently shown. Some of that is in the debug logging but having it in the audit logging does no harm. (we’re doing a failover after all, so logging things that are not done is also fine)FR: Come up with a better way to follow what's going on
To find the detail of what was going on you could find the active orchestrator instance and look at the log files. They are quite detailed. However, for me this is quite unworkable. The installation I manage has over 1000 hosts and with instance discovery taking place every 5 seconds there is a lot of “noise” in the logging during this 8 minute period. Better visibility or what orchestrator is doing during recovery would be good.
Audit
in the GUI which takes us directly to the last recovery process so we can see the steps that have been executed so far. A simple page refresh would allow us to follow changes. So while rather "brute force" it would allow me to quickly jump to and see “what orchestrator is doing from the GUI.Audit
->Latest Recovery Steps
Would this be possible?
Improve logging of pseudo-GTID relocations as they may be slow.
Given the issue I saw it looks like calls to
RegroupReplicasPseudoGTIDIncludingSubReplicasOfBinlogServers
and similar calls could have an extra parameter,topologyRecovery
, which can be used for optional audit logging iftopologyRecovery != nil
? This would allow the same routines to be used for normal relocation purposes without special logging, but under recovery processing extra logging could be emitted. Does this sound like a reasonable idea?The comments here are to reference some issues seen and some thoughts on possible workarounds or improvements. Thoughts from you on this would also be most welcome.
I m not sure if I will have time now to look at this but thought it would be good to reference the issue so it does not get forgotten.