openark / orchestrator

MySQL replication topology management and HA
Apache License 2.0
5.61k stars 927 forks source link

FR: Improve audit logging and provide quick link to Audit, Latest Recovery Steps #472

Open sjmudd opened 6 years ago

sjmudd commented 6 years ago

Background

Partial Screenshot from /web/audit-recovery-steps/

screen shot 2018-04-17 at 19 23 59

A recent failover took 8 minutes to run, which is "quite a long time". 32 slaves were involved. One of the slaves was a delayed replica with a configured 24-hour delay. Orchestrator was configured on this chain to use pseudo-gtid.

FR: Improve audit logging to provide more detail

During those 8 minutes there was no visible signs of what was going on.

FR: Come up with a better way to follow what's going on

To find the detail of what was going on you could find the active orchestrator instance and look at the log files. They are quite detailed. However, for me this is quite unworkable. The installation I manage has over 1000 hosts and with instance discovery taking place every 5 seconds there is a lot of “noise” in the logging during this 8 minute period. Better visibility or what orchestrator is doing during recovery would be good.

Would this be possible?

Improve logging of pseudo-GTID relocations as they may be slow.

Given the issue I saw it looks like calls to RegroupReplicasPseudoGTIDIncludingSubReplicasOfBinlogServers and similar calls could have an extra parameter, topologyRecovery, which can be used for optional audit logging if topologyRecovery != nil? This would allow the same routines to be used for normal relocation purposes without special logging, but under recovery processing extra logging could be emitted. Does this sound like a reasonable idea?

The comments here are to reference some issues seen and some thoughts on possible workarounds or improvements. Thoughts from you on this would also be most welcome.

I m not sure if I will have time now to look at this but thought it would be good to reference the issue so it does not get forgotten.

shlomi-noach commented 6 years ago

Thank you for opening this Issue. While I have no immediate insight into what went wrong:

...One of the slaves was a delayed replica with a configured 24-hour delay.

It sounds as if this should be the smoking gun. However, orchestrator specifically targets delayed replicas in a special way and relocates them offline:

https://github.com/github/orchestrator/blob/3a57610ecd05c9fddd27432a9a8a77ad80d9e7ce/go/inst/instance_topology.go#L1778-L1783

https://github.com/github/orchestrator/blob/3a57610ecd05c9fddd27432a9a8a77ad80d9e7ce/go/inst/instance_topology.go#L1896-L1905

From the screenshot you posted it seems there were no postponed functions, which means the delayed replica was not run asynchronously.

Which reminds me I added a newer logic to relocate replicas and wanted to retire the older logic (right now configurable by config.Config.PseudoGTIDPreferIndependentMultiMatch).

Please confirm you're using PseudoGTIDPreferIndependentMultiMatch = false (or none at all, as I suspect), which means you're using the "older" method. Nothing bad about it, just so that I know where to look.

...could have an extra parameter, topologyRecovery, which can be used for optional audit logging if topologyRecovery != nil? This would allow the same routines to be used for normal relocation purposes without special logging, but under recovery processing extra logging could be emitted. Does this sound like a reasonable idea?

Absolutely, this would be the way to go in temrs of visibility/logging.

shlomi-noach commented 6 years ago

hmmm. What's your PostponeReplicaRecoveryOnLagMinutes or PostponeSlaveRecoveryOnLagMinutes?

I think I should get rid of this condition and just hard code to "always postpone sql delayed servers".

sjmudd commented 6 years ago

Thanks for your feedback.

So thanks for the pointer regarding the MultiMatch thing. It's hard even for me to follow all the changes you are making.

sjmudd commented 6 years ago

I had another failure and logging showed:

2018-04-26 12:13:22 | Completed PreFailoverProcesses hook 1 of 1 in 2.910507384s
2018-04-26 12:13:22 | done running PreFailoverProcesses hooks
2018-04-26 12:13:22 | RecoverDeadMaster: will recover old-master.example.com:3306
2018-04-26 12:13:22 | RecoverDeadMaster: masterRecoveryType=MasterRecoveryPseudoGTID
2018-04-26 12:13:22 | RecoverDeadMaster: regrouping replicas via Pseudo-GTID
2018-04-26 12:28:29 | RecoverDeadMaster: 1 postponed functions
2018-04-26 12:28:29 | promoted replica: new-master.example.com:3306
2018-04-26 12:28:29 | checking if should replace promoted replica with a better candidate
2018-04-26 12:28:29 | + checking if promoted replica is the ideal candidate
2018-04-26 12:28:29 | promoted replica new-master.example.com:3306 is the ideal candidate
2018-04-26 12:28:30 | RecoverDeadMaster: successfully promoted new-master.example.com:3306

This was using

$ sudo grep PseudoGTIDPreferIndependentMultiMatch /etc/orchestrator.conf.json
  "PseudoGTIDPreferIndependentMultiMatch": true,

Not shown in the logging but there were 40 slaves that needed to be relocated so some time is expected to do that but no confirmation of the details is seen here. So it does look like adding extra logging to find the "blockage" would be good. I clearly need to move this replication chain to use GTID as that will speed up the whole process.

shlomi-noach commented 6 years ago

Are you able to relocate-replicas, at your own good time, a large set of replicas that include a delayed server, and report back how long that took?

sjmudd commented 6 years ago

Logging is confusing. Below shows the server using GTID and then says it isn't. gtid_mode on this server is currently OFF_PERMISSIVE.

I was testing how long it would take to move a group of slaves with no delayed slave and was then going to test with delayed slaves. However, I note that logging surprises me. It seems to indicate that GTID is configured and then says it isn't.

I know that some of the GTID code has been updated but just in case this may show the same behaviour in current code I'm including it for reference.:

[ Names have been anonymised but you get the idea ]

$ sudo orchestrator --verbose -c relocate-slaves -i server-1015 -d server-1026
2018-05-04 17:09:54 INFO starting orchestrator, version: 3.0.1.8, git commit: 208e53de7aa0f0c0a4f6192942f3c3c0e73e6cec
2018-05-04 17:09:54 INFO Read config: /etc/orchestrator.conf.json
2018-05-04 17:09:54 INFO Connecting to backend: maxConnections: 1000, maxIdleConns: 250
2018-05-04 17:09:54 INFO Will move 34 replicas below server-1026:3306 via GTID <<== wrong
2018-05-04 17:09:54 INFO Will match 34 replicas below server-1026:3306 via Pseudo-GTID, independently
2018-05-04 17:09:55 INFO Will match server-1002:3306 below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1002:3306 cluster:server-2053:3306 message:maintenanceToken: 20549, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO Stopped slave on server-1002:3306, Self::0, Exec:binlog.013031:21754893
2018-05-04 17:09:55 INFO Will match server-1061:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1213:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1107:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1200:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1138:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1154:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1086:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1040:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1018:3306 below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1103:3306 below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1138:3306 cluster:server-2053:3306 message:maintenanceToken: 20554, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1061:3306 cluster:server-2053:3306 message:maintenanceToken: 20550, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1213:3306 cluster:server-2053:3306 message:maintenanceToken: 20551, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1107:3306 cluster:server-2053:3306 message:maintenanceToken: 20552, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO Will match server-1160:3306 below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1154:3306 cluster:server-2053:3306 message:maintenanceToken: 20555, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1200:3306 cluster:server-2053:3306 message:maintenanceToken: 20553, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1018:3306 cluster:server-2053:3306 message:maintenanceToken: 20558, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1086:3306 cluster:server-2053:3306 message:maintenanceToken: 20556, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1103:3306 cluster:server-2053:3306 message:maintenanceToken: 20559, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1040:3306 cluster:server-2053:3306 message:maintenanceToken: 20557, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:55 INFO auditType:begin-maintenance instance:server-1160:3306 cluster:server-2053:3306 message:maintenanceToken: 20560, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1091:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1019:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1025:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1035:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1145:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1149:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1108:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1067:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1019:3306 cluster:server-2053:3306 message:maintenanceToken: 20562, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1091:3306 cluster:server-2053:3306 message:maintenanceToken: 20561, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1145:3306 cluster:server-2053:3306 message:maintenanceToken: 20565, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1172:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1063:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1071:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1151:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1025:3306 cluster:server-2053:3306 message:maintenanceToken: 20563, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1166:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1108:3306 cluster:server-2053:3306 message:maintenanceToken: 20567, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1035:3306 cluster:server-2053:3306 message:maintenanceToken: 20564, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1149:3306 cluster:server-2053:3306 message:maintenanceToken: 20566, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1172:3306 cluster:server-2053:3306 message:maintenanceToken: 20569, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1028:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1110:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1071:3306 cluster:server-2053:3306 message:maintenanceToken: 20571, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1165:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1063:3306 cluster:server-2053:3306 message:maintenanceToken: 20570, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1151:3306 cluster:server-2053:3306 message:maintenanceToken: 20572, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1067:3306 cluster:server-2053:3306 message:maintenanceToken: 20568, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1110:3306 cluster:server-2053:3306 message:maintenanceToken: 20575, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1166:3306 cluster:server-2053:3306 message:maintenanceToken: 20573, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1165:3306 cluster:server-2053:3306 message:maintenanceToken: 20576, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1028:3306 cluster:server-2053:3306 message:maintenanceToken: 20574, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1101:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1101:3306 cluster:server-2053:3306 message:maintenanceToken: 20577, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1204:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1204:3306 cluster:server-2053:3306 message:maintenanceToken: 20578, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1111:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1111:3306 cluster:server-2053:3306 message:maintenanceToken: 20579, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1162:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1162:3306 cluster:server-2053:3306 message:maintenanceToken: 20580, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Will match server-1022:3306 below server-1026:3306
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1022:3306 cluster:server-2053:3306 message:maintenanceToken: 20581, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Stopped slave on server-1154:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Stopped slave on server-1061:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Stopped slave on server-1107:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Will match server-1147:3306 below server-1026:3306
2018-05-04 17:09:56 INFO Stopped slave on server-1103:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Stopped slave on server-1040:3306, Self::0, Exec:binlog.013031:21889835
2018-05-04 17:09:56 INFO auditType:begin-maintenance instance:server-1147:3306 cluster:server-2053:3306 message:maintenanceToken: 20582, owner: root, reason: match below server-1026:3306
2018-05-04 17:09:56 INFO Stopped slave on server-1200:3306, Self::0, Exec:binlog.013031:21889835
2018-05-04 17:09:56 INFO Stopped slave on server-1018:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Stopped slave on server-1086:3306, Self::0, Exec:binlog.013031:21888376
2018-05-04 17:09:56 INFO Stopped slave on server-1138:3306, Self::0, Exec:binlog.013031:21889835
2018-05-04 17:09:56 INFO Stopped slave on server-1160:3306, Self::0, Exec:binlog.013031:21886888
2018-05-04 17:09:56 INFO Stopped slave on server-1213:3306, Self::0, Exec:binlog.013031:21889835
2018-05-04 17:09:56 INFO Stopped slave on server-1019:3306, Self::0, Exec:binlog.013031:21893904
2018-05-04 17:09:56 INFO Stopped slave on server-1151:3306, Self::0, Exec:binlog.013031:21894515
2018-05-04 17:09:56 INFO Stopped slave on server-1145:3306, Self::0, Exec:binlog.013031:21894515
2018-05-04 17:09:56 INFO Stopped slave on server-1035:3306, Self::0, Exec:binlog.013031:21894515
2018-05-04 17:09:56 INFO Stopped slave on server-1067:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1108:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1063:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1149:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1071:3306, Self::0, Exec:binlog.013031:21896363
2018-05-04 17:09:56 INFO Stopped slave on server-1101:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1172:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1025:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1204:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1162:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1028:3306, Self::0, Exec:binlog.013031:21896780
2018-05-04 17:09:56 INFO Stopped slave on server-1110:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1091:3306, Self::0, Exec:binlog.013031:21895971
2018-05-04 17:09:56 INFO Stopped slave on server-1166:3306, Self::0, Exec:binlog.013031:21896780
2018-05-04 17:09:56 INFO Stopped slave on server-1165:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1111:3306, Self::0, Exec:binlog.013031:21902689
2018-05-04 17:09:56 INFO Stopped slave on server-1022:3306, Self::0, Exec:binlog.013031:21898591
2018-05-04 17:09:56 INFO Stopped slave on server-1147:3306, Self::0, Exec:binlog.013031:21922683
2018-05-04 17:09:58 INFO ChangeMasterTo: Changed master on server-1002:3306 to: server-1026:3306, binlog.012939:7102012. GTID: false
2018-05-04 17:09:59 INFO Started slave on server-1002:3306
2018-05-04 17:09:59 INFO ChangeMasterTo: Changed master on server-1160:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:09:59 INFO auditType:match-below instance:server-1002:3306 cluster:server-2053:3306 message:matched server-1002:3306 below server-1026:3306
2018-05-04 17:09:59 INFO ChangeMasterTo: Changed master on server-1067:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:09:59 INFO auditType:end-maintenance instance:server-1002:3306 cluster:server-2053:3306 message:maintenanceToken: 20549
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1040:3306 to: server-1026:3306, binlog.012939:7236954. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1018:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1107:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1200:3306 to: server-1026:3306, binlog.012939:7236954. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1160:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1151:3306 to: server-1026:3306, binlog.012939:7241634. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1061:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1086:3306 to: server-1026:3306, binlog.012939:7235495. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1022:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1108:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1103:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1067:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1145:3306 to: server-1026:3306, binlog.012939:7241634. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1040:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1138:3306 to: server-1026:3306, binlog.012939:7236954. GTID: false
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1160:3306 cluster:server-2053:3306 message:matched server-1160:3306 below server-1026:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1101:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1160:3306 cluster:server-2053:3306 message:maintenanceToken: 20560
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1154:3306 to: server-1026:3306, binlog.012939:7234007. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1107:3306
2018-05-04 17:10:00 INFO Started slave on server-1018:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1067:3306 cluster:server-2053:3306 message:matched server-1067:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1067:3306 cluster:server-2053:3306 message:maintenanceToken: 20568
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1040:3306 cluster:server-2053:3306 message:matched server-1040:3306 below server-1026:3306
2018-05-04 17:10:00 INFO Started slave on server-1200:3306
2018-05-04 17:10:00 INFO Started slave on server-1151:3306
2018-05-04 17:10:00 INFO Started slave on server-1108:3306
2018-05-04 17:10:00 INFO Started slave on server-1022:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1040:3306 cluster:server-2053:3306 message:maintenanceToken: 20557
2018-05-04 17:10:00 INFO Started slave on server-1061:3306
2018-05-04 17:10:00 INFO Started slave on server-1103:3306
2018-05-04 17:10:00 INFO Started slave on server-1086:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1107:3306 cluster:server-2053:3306 message:matched server-1107:3306 below server-1026:3306
2018-05-04 17:10:00 INFO Started slave on server-1145:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1018:3306 cluster:server-2053:3306 message:matched server-1018:3306 below server-1026:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1025:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1107:3306 cluster:server-2053:3306 message:maintenanceToken: 20552
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1063:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1018:3306 cluster:server-2053:3306 message:maintenanceToken: 20558
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1110:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1165:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1162:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1019:3306 to: server-1026:3306, binlog.012939:7241023. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1154:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1166:3306 to: server-1026:3306, binlog.012939:7243899. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1028:3306 to: server-1026:3306, binlog.012939:7243899. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1138:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1151:3306 cluster:server-2053:3306 message:matched server-1151:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1151:3306 cluster:server-2053:3306 message:maintenanceToken: 20572
2018-05-04 17:10:00 INFO Started slave on server-1101:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1200:3306 cluster:server-2053:3306 message:matched server-1200:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1061:3306 cluster:server-2053:3306 message:matched server-1061:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1086:3306 cluster:server-2053:3306 message:matched server-1086:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1022:3306 cluster:server-2053:3306 message:matched server-1022:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1103:3306 cluster:server-2053:3306 message:matched server-1103:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1200:3306 cluster:server-2053:3306 message:maintenanceToken: 20553
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1108:3306 cluster:server-2053:3306 message:matched server-1108:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1086:3306 cluster:server-2053:3306 message:maintenanceToken: 20556
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1022:3306 cluster:server-2053:3306 message:maintenanceToken: 20581
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1061:3306 cluster:server-2053:3306 message:maintenanceToken: 20550
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1108:3306 cluster:server-2053:3306 message:maintenanceToken: 20567
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1103:3306 cluster:server-2053:3306 message:maintenanceToken: 20559
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1145:3306 cluster:server-2053:3306 message:matched server-1145:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1145:3306 cluster:server-2053:3306 message:maintenanceToken: 20565
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1154:3306 cluster:server-2053:3306 message:matched server-1154:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1154:3306 cluster:server-2053:3306 message:maintenanceToken: 20555
2018-05-04 17:10:00 INFO Started slave on server-1019:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1101:3306 cluster:server-2053:3306 message:matched server-1101:3306 below server-1026:3306
2018-05-04 17:10:00 INFO Started slave on server-1166:3306
2018-05-04 17:10:00 INFO Started slave on server-1025:3306
2018-05-04 17:10:00 INFO Started slave on server-1165:3306
2018-05-04 17:10:00 INFO Started slave on server-1162:3306
2018-05-04 17:10:00 INFO Started slave on server-1110:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1138:3306 cluster:server-2053:3306 message:matched server-1138:3306 below server-1026:3306
2018-05-04 17:10:00 INFO Started slave on server-1028:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1035:3306 to: server-1026:3306, binlog.012939:7241634. GTID: false
2018-05-04 17:10:00 INFO Started slave on server-1063:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1101:3306 cluster:server-2053:3306 message:maintenanceToken: 20577
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1138:3306 cluster:server-2053:3306 message:maintenanceToken: 20554
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1213:3306 to: server-1026:3306, binlog.012939:7236954. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1091:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1149:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1172:3306 to: server-1026:3306, binlog.012939:7243090. GTID: false
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1019:3306 cluster:server-2053:3306 message:matched server-1019:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1019:3306 cluster:server-2053:3306 message:maintenanceToken: 20562
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1162:3306 cluster:server-2053:3306 message:matched server-1162:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1166:3306 cluster:server-2053:3306 message:matched server-1166:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1166:3306 cluster:server-2053:3306 message:maintenanceToken: 20573
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1162:3306 cluster:server-2053:3306 message:maintenanceToken: 20580
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1110:3306 cluster:server-2053:3306 message:matched server-1110:3306 below server-1026:3306
2018-05-04 17:10:00 INFO ChangeMasterTo: Changed master on server-1204:3306 to: server-1026:3306, binlog.012939:7245710. GTID: false
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1110:3306 cluster:server-2053:3306 message:maintenanceToken: 20575
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1165:3306 cluster:server-2053:3306 message:matched server-1165:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1165:3306 cluster:server-2053:3306 message:maintenanceToken: 20576
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1028:3306 cluster:server-2053:3306 message:matched server-1028:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1028:3306 cluster:server-2053:3306 message:maintenanceToken: 20574
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1025:3306 cluster:server-2053:3306 message:matched server-1025:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1025:3306 cluster:server-2053:3306 message:maintenanceToken: 20563
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1063:3306 cluster:server-2053:3306 message:matched server-1063:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1063:3306 cluster:server-2053:3306 message:maintenanceToken: 20570
2018-05-04 17:10:00 INFO Started slave on server-1035:3306
2018-05-04 17:10:00 INFO Started slave on server-1149:3306
2018-05-04 17:10:00 INFO Started slave on server-1172:3306
2018-05-04 17:10:00 INFO Started slave on server-1091:3306
2018-05-04 17:10:00 INFO Started slave on server-1213:3306
2018-05-04 17:10:00 INFO Started slave on server-1204:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1035:3306 cluster:server-2053:3306 message:matched server-1035:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1035:3306 cluster:server-2053:3306 message:maintenanceToken: 20564
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1149:3306 cluster:server-2053:3306 message:matched server-1149:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1149:3306 cluster:server-2053:3306 message:maintenanceToken: 20566
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1172:3306 cluster:server-2053:3306 message:matched server-1172:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1172:3306 cluster:server-2053:3306 message:maintenanceToken: 20569
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1204:3306 cluster:server-2053:3306 message:matched server-1204:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1091:3306 cluster:server-2053:3306 message:matched server-1091:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1204:3306 cluster:server-2053:3306 message:maintenanceToken: 20578
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1091:3306 cluster:server-2053:3306 message:maintenanceToken: 20561
2018-05-04 17:10:00 INFO auditType:match-below instance:server-1213:3306 cluster:server-2053:3306 message:matched server-1213:3306 below server-1026:3306
2018-05-04 17:10:00 INFO auditType:end-maintenance instance:server-1213:3306 cluster:server-2053:3306 message:maintenanceToken: 20551
2018-05-04 17:10:01 INFO ChangeMasterTo: Changed master on server-1111:3306 to: server-1026:3306, binlog.012939:7249808. GTID: false
2018-05-04 17:10:01 INFO ChangeMasterTo: Changed master on server-1071:3306 to: server-1026:3306, binlog.012939:7243482. GTID: false
2018-05-04 17:10:01 INFO Started slave on server-1111:3306
2018-05-04 17:10:01 INFO auditType:match-below instance:server-1111:3306 cluster:server-2053:3306 message:matched server-1111:3306 below server-1026:3306
2018-05-04 17:10:01 INFO auditType:end-maintenance instance:server-1111:3306 cluster:server-2053:3306 message:maintenanceToken: 20579
2018-05-04 17:10:01 INFO Started slave on server-1071:3306
2018-05-04 17:10:01 INFO auditType:match-below instance:server-1071:3306 cluster:server-2053:3306 message:matched server-1071:3306 below server-1026:3306
2018-05-04 17:10:01 INFO auditType:end-maintenance instance:server-1071:3306 cluster:server-2053:3306 message:maintenanceToken: 20571
2018-05-04 17:10:01 INFO ChangeMasterTo: Changed master on server-1147:3306 to: server-1026:3306, binlog.012939:7269802. GTID: false
2018-05-04 17:10:01 INFO Started slave on server-1147:3306
2018-05-04 17:10:01 INFO auditType:match-below instance:server-1147:3306 cluster:server-2053:3306 message:matched server-1147:3306 below server-1026:3306
2018-05-04 17:10:01 INFO auditType:end-maintenance instance:server-1147:3306 cluster:server-2053:3306 message:maintenanceToken: 20582
2018-05-04 17:10:01 INFO auditType:multi-match-below-independent instance:server-1026:3306 cluster:server-2053:3306 message:matched 34/34 replicas below server-1026:3306 via Pseudo-GTID
2018-05-04 17:10:01 INFO auditType:relocate-replicas instance:server-1015:3306 cluster:server-2053:3306 message:relocated 34 replicas of server-1015:3306 below server-1026:3306
server-1002:3306
server-1160:3306
server-1067:3306
server-1040:3306
server-1107:3306
server-1018:3306
server-1151:3306
server-1200:3306
server-1086:3306
server-1022:3306
server-1061:3306
server-1108:3306
server-1103:3306
server-1145:3306
server-1154:3306
server-1101:3306
server-1138:3306
server-1019:3306
server-1166:3306
server-1162:3306
server-1110:3306
server-1165:3306
server-1028:3306
server-1025:3306
server-1063:3306
server-1035:3306
server-1149:3306
server-1172:3306
server-1204:3306
server-1091:3306
server-1213:3306
server-1111:3306
server-1071:3306
server-1147:3306
$

So I see on line 6 a switch from using GTID to moving to pseudo-GTID.

So time to relocate 34 slaves is about 7 seconds.

sjmudd commented 6 years ago

From memory just relocating a delayed slave (24h) took some minutes and has often failed. However, as usual trying it now it succeeds (moving a single server) much more quickly:

$ time sudo orchestrator --verbose -c relocate -i delay-2002  -d server-2050
2018-05-04 17:53:18 INFO starting orchestrator, version: 3.0.1.8, git commit: 208e53de7aa0f0c0a4f6192942f3c3c0e73e6cec
2018-05-04 17:53:18 INFO Read config: /etc/orchestrator.conf.json
2018-05-04 17:53:18 INFO Connecting to backend: maxConnections: 1000, maxIdleConns: 250
2018-05-04 17:53:18 INFO Will match delay-2002:3306 below server-2050:3306
2018-05-04 17:53:18 INFO auditType:begin-maintenance instance:delay-2002:3306 cluster:server-2053:3306 message:maintenanceToken: 20621, owner: root, reason: match below server-2050:3306
2018-05-04 17:53:18 INFO Stopped slave on delay-2002:3306, Self::0, Exec:binlog.011186:80995296
2018-05-04 17:53:41 INFO ChangeMasterTo: Changed master on delay-2002:3306 to: server-2050:3306, binlog.011299:58782124. GTID: false
2018-05-04 17:53:41 INFO Started slave on delay-2002:3306
2018-05-04 17:53:41 INFO auditType:match-below instance:delay-2002:3306 cluster:server-2053:3306 message:matched delay-2002:3306 below server-2050:3306
2018-05-04 17:53:41 INFO auditType:end-maintenance instance:delay-2002:3306 cluster:server-2053:3306 message:maintenanceToken: 20621
2018-05-04 17:53:41 INFO auditType:relocate-below instance:delay-2002:3306 cluster:server-2053:3306 message:relocated delay-2002:3306 below server-2050:3306
delay-2002:3306<server-2050:3306

real    0m22.980s
user    0m13.818s
sys     0m1.428s

$ time sudo orchestrator --verbose -c relocate -i delay-2002  -d server-2057
2018-05-04 17:56:44 INFO starting orchestrator, version: 3.0.1.8, git commit: 208e53de7aa0f0c0a4f6192942f3c3c0e73e6cec
2018-05-04 17:56:44 INFO Read config: /etc/orchestrator.conf.json
2018-05-04 17:56:44 INFO Connecting to backend: maxConnections: 1000, maxIdleConns: 250
2018-05-04 17:56:44 INFO Will match delay-2002:3306 below server-2057:3306
2018-05-04 17:56:44 INFO auditType:begin-maintenance instance:delay-2002:3306 cluster:server-2053:3306 message:maintenanceToken: 20622, owner: root, reason: match below server-2057:3306
2018-05-04 17:56:44 INFO Stopped slave on delay-2002:3306, Self::0, Exec:binlog.011300:63790012
2018-05-04 17:57:08 INFO ChangeMasterTo: Changed master on delay-2002:3306 to: server-2057:3306, binlog.011187:85842087. GTID: false
2018-05-04 17:57:08 INFO Started slave on delay-2002:3306
2018-05-04 17:57:08 INFO auditType:match-below instance:delay-2002:3306 cluster:server-2053:3306 message:matched delay-2002:3306 below server-2057:3306
2018-05-04 17:57:08 INFO auditType:end-maintenance instance:delay-2002:3306 cluster:server-2053:3306 message:maintenanceToken: 20622
2018-05-04 17:57:08 INFO auditType:relocate-below instance:delay-2002:3306 cluster:server-2053:3306 message:relocated delay-2002:3306 below server-2057:3306
delay-2002:3306<server-2057:3306

real    0m24.296s
user    0m13.899s
sys     0m1.547s

Taking 23 and 24 seconds respectively.

sjmudd commented 6 years ago

And of course testing this with nothing broken shows it works fine ....

$ sudo orchestrator --verbose -c relocate-slaves -i server-1070 -d server-1053
2018-05-04 18:06:53 INFO starting orchestrator, version: 3.0.1.8, git commit: 208e53de7aa0f0c0a4f6192942f3c3c0e73e6cec
2018-05-04 18:06:53 INFO Read config: /etc/orchestrator.conf.json
2018-05-04 18:06:53 INFO Connecting to backend: maxConnections: 1000, maxIdleConns: 250
2018-05-04 18:06:53 INFO Will move 27 replicas below server-1053:3306 via GTID
2018-05-04 18:06:53 INFO Will match 27 replicas below server-1053:3306 via Pseudo-GTID, independently
2018-05-04 18:06:54 INFO Will match server-1217:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1217:3306 cluster:server-2053:3306 message:maintenanceToken: 20623, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1169:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1169:3306 cluster:server-2053:3306 message:maintenanceToken: 20624, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1187:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1187:3306 cluster:server-2053:3306 message:maintenanceToken: 20625, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1207:3306 below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1041:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1207:3306 cluster:server-2053:3306 message:maintenanceToken: 20626, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1041:3306 cluster:server-2053:3306 message:maintenanceToken: 20627, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1117:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1117:3306 cluster:server-2053:3306 message:maintenanceToken: 20628, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1198:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1198:3306 cluster:server-2053:3306 message:maintenanceToken: 20629, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Stopped slave on server-1217:3306, Self::0, Exec:binlog.012579:22476581
2018-05-04 18:06:54 INFO Stopped slave on server-1169:3306, Self::0, Exec:binlog.012579:22492743
2018-05-04 18:06:54 INFO Stopped slave on server-1187:3306, Self::0, Exec:binlog.012579:22498180
2018-05-04 18:06:54 INFO Will match server-1153:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1153:3306 cluster:server-2053:3306 message:maintenanceToken: 20630, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match delay-1002:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:delay-1002:3306 cluster:server-2053:3306 message:maintenanceToken: 20631, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Stopped slave on server-1207:3306, Self::0, Exec:binlog.012579:22516872
2018-05-04 18:06:54 INFO Will match server-1065:3306 below server-1053:3306
2018-05-04 18:06:54 INFO Stopped slave on server-1117:3306, Self::0, Exec:binlog.012579:22529134
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1065:3306 cluster:server-2053:3306 message:maintenanceToken: 20632, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1210:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1210:3306 cluster:server-2053:3306 message:maintenanceToken: 20633, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Stopped slave on server-1041:3306, Self::0, Exec:binlog.012579:22529134
2018-05-04 18:06:54 INFO Will match server-1066:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1066:3306 cluster:server-2053:3306 message:maintenanceToken: 20634, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Will match server-1050:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1050:3306 cluster:server-2053:3306 message:maintenanceToken: 20635, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:54 INFO Stopped slave on server-1198:3306, Self::0, Exec:binlog.012579:22542113
2018-05-04 18:06:54 INFO Will match server-1021:3306 below server-1053:3306
2018-05-04 18:06:54 INFO auditType:begin-maintenance instance:server-1021:3306 cluster:server-2053:3306 message:maintenanceToken: 20636, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1148:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1148:3306 cluster:server-2053:3306 message:maintenanceToken: 20637, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1153:3306, Self::0, Exec:binlog.012579:22560807
2018-05-04 18:06:55 INFO Will match server-1017:3306 below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1102:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1017:3306 cluster:server-2053:3306 message:maintenanceToken: 20638, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1102:3306 cluster:server-2053:3306 message:maintenanceToken: 20639, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on delay-1002:3306, Self::0, Exec:binlog.012259:63192257
2018-05-04 18:06:55 INFO Stopped slave on server-1065:3306, Self::0, Exec:binlog.012579:22573344
2018-05-04 18:06:55 INFO Stopped slave on server-1066:3306, Self::0, Exec:binlog.012579:22580426
2018-05-04 18:06:55 INFO Will match server-1020:3306 below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1164:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1020:3306 cluster:server-2053:3306 message:maintenanceToken: 20640, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1164:3306 cluster:server-2053:3306 message:maintenanceToken: 20641, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1211:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1211:3306 cluster:server-2053:3306 message:maintenanceToken: 20642, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1181:3306 below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1210:3306, Self::0, Exec:binlog.012579:22590570
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1181:3306 cluster:server-2053:3306 message:maintenanceToken: 20643, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1134:3306 below server-1053:3306
2018-05-04 18:06:55 INFO Will match server-1133:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1134:3306 cluster:server-2053:3306 message:maintenanceToken: 20644, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1133:3306 cluster:server-2053:3306 message:maintenanceToken: 20645, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1050:3306, Self::0, Exec:binlog.012579:22599282
2018-05-04 18:06:55 INFO Will match server-1079:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1079:3306 cluster:server-2053:3306 message:maintenanceToken: 20646, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1021:3306, Self::0, Exec:binlog.012579:22625656
2018-05-04 18:06:55 INFO Stopped slave on server-1102:3306, Self::0, Exec:binlog.012579:22633478
2018-05-04 18:06:55 INFO Will match server-1121:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1121:3306 cluster:server-2053:3306 message:maintenanceToken: 20647, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1148:3306, Self::0, Exec:binlog.012579:22633478
2018-05-04 18:06:55 INFO Stopped slave on server-1017:3306, Self::0, Exec:binlog.012579:22642053
2018-05-04 18:06:55 INFO Stopped slave on server-1164:3306, Self::0, Exec:binlog.012579:22646216
2018-05-04 18:06:55 INFO Stopped slave on server-1181:3306, Self::0, Exec:binlog.012579:22679198
2018-05-04 18:06:55 INFO Will match server-1155:3306 below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1020:3306, Self::0, Exec:binlog.012579:22646216
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1155:3306 cluster:server-2053:3306 message:maintenanceToken: 20648, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:55 INFO Stopped slave on server-1211:3306, Self::0, Exec:binlog.012579:22646216
2018-05-04 18:06:55 INFO Stopped slave on server-1134:3306, Self::0, Exec:binlog.012579:22711095
2018-05-04 18:06:55 INFO Stopped slave on server-1079:3306, Self::0, Exec:binlog.012579:22716263
2018-05-04 18:06:55 INFO Stopped slave on server-1133:3306, Self::0, Exec:binlog.012579:22714366
2018-05-04 18:06:55 INFO Stopped slave on server-1121:3306, Self::0, Exec:binlog.012579:22719073
2018-05-04 18:06:55 INFO Stopped slave on server-1155:3306, Self::0, Exec:binlog.012579:22817863
2018-05-04 18:06:55 INFO Will match server-1208:3306 below server-1053:3306
2018-05-04 18:06:55 INFO auditType:begin-maintenance instance:server-1208:3306 cluster:server-2053:3306 message:maintenanceToken: 20649, owner: root, reason: match below server-1053:3306
2018-05-04 18:06:56 INFO Stopped slave on server-1208:3306, Self::0, Exec:binlog.012579:23307973
2018-05-04 18:06:56 INFO ChangeMasterTo: Changed master on server-1169:3306 to: server-1053:3306, binlog.013040:58059824. GTID: false
2018-05-04 18:06:56 INFO ChangeMasterTo: Changed master on server-1117:3306 to: server-1053:3306, binlog.013040:58096215. GTID: false
2018-05-04 18:06:56 INFO Started slave on server-1169:3306
2018-05-04 18:06:56 INFO auditType:match-below instance:server-1169:3306 cluster:server-2053:3306 message:matched server-1169:3306 below server-1053:3306
2018-05-04 18:06:56 INFO auditType:end-maintenance instance:server-1169:3306 cluster:server-2053:3306 message:maintenanceToken: 20624
2018-05-04 18:06:56 INFO Started slave on server-1117:3306
2018-05-04 18:06:56 INFO ChangeMasterTo: Changed master on server-1155:3306 to: server-1053:3306, binlog.013040:58384944. GTID: false
2018-05-04 18:06:56 INFO auditType:match-below instance:server-1117:3306 cluster:server-2053:3306 message:matched server-1117:3306 below server-1053:3306
2018-05-04 18:06:56 INFO auditType:end-maintenance instance:server-1117:3306 cluster:server-2053:3306 message:maintenanceToken: 20628
2018-05-04 18:06:56 INFO Started slave on server-1155:3306
2018-05-04 18:06:56 INFO auditType:match-below instance:server-1155:3306 cluster:server-2053:3306 message:matched server-1155:3306 below server-1053:3306
2018-05-04 18:06:56 INFO auditType:end-maintenance instance:server-1155:3306 cluster:server-2053:3306 message:maintenanceToken: 20648
2018-05-04 18:06:57 INFO ChangeMasterTo: Changed master on server-1187:3306 to: server-1053:3306, binlog.013040:58065261. GTID: false
2018-05-04 18:06:57 INFO ChangeMasterTo: Changed master on server-1208:3306 to: server-1053:3306, binlog.013040:58875054. GTID: false
2018-05-04 18:06:57 INFO Started slave on server-1187:3306
2018-05-04 18:06:57 INFO Started slave on server-1208:3306
2018-05-04 18:06:57 INFO auditType:match-below instance:server-1187:3306 cluster:server-2053:3306 message:matched server-1187:3306 below server-1053:3306
2018-05-04 18:06:57 INFO auditType:end-maintenance instance:server-1187:3306 cluster:server-2053:3306 message:maintenanceToken: 20625
2018-05-04 18:06:57 INFO auditType:match-below instance:server-1208:3306 cluster:server-2053:3306 message:matched server-1208:3306 below server-1053:3306
2018-05-04 18:06:57 INFO auditType:end-maintenance instance:server-1208:3306 cluster:server-2053:3306 message:maintenanceToken: 20649
2018-05-04 18:06:58 INFO ChangeMasterTo: Changed master on server-1041:3306 to: server-1053:3306, binlog.013040:58096215. GTID: false
2018-05-04 18:06:58 INFO Started slave on server-1041:3306
2018-05-04 18:06:58 INFO ChangeMasterTo: Changed master on server-1217:3306 to: server-1053:3306, binlog.013040:58043662. GTID: false
2018-05-04 18:06:58 INFO auditType:match-below instance:server-1041:3306 cluster:server-2053:3306 message:matched server-1041:3306 below server-1053:3306
2018-05-04 18:06:58 INFO auditType:end-maintenance instance:server-1041:3306 cluster:server-2053:3306 message:maintenanceToken: 20627
2018-05-04 18:06:58 INFO Started slave on server-1217:3306
2018-05-04 18:06:58 INFO ChangeMasterTo: Changed master on server-1207:3306 to: server-1053:3306, binlog.013040:58083953. GTID: false
2018-05-04 18:06:58 INFO auditType:match-below instance:server-1217:3306 cluster:server-2053:3306 message:matched server-1217:3306 below server-1053:3306
2018-05-04 18:06:58 INFO auditType:end-maintenance instance:server-1217:3306 cluster:server-2053:3306 message:maintenanceToken: 20623
2018-05-04 18:06:58 INFO ChangeMasterTo: Changed master on server-1065:3306 to: server-1053:3306, binlog.013040:58140425. GTID: false
2018-05-04 18:06:58 INFO Started slave on server-1207:3306
2018-05-04 18:06:59 INFO Started slave on server-1065:3306
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1207:3306 cluster:server-2053:3306 message:matched server-1207:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1207:3306 cluster:server-2053:3306 message:maintenanceToken: 20626
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1065:3306 cluster:server-2053:3306 message:matched server-1065:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1065:3306 cluster:server-2053:3306 message:maintenanceToken: 20632
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1134:3306 to: server-1053:3306, binlog.013040:58278176. GTID: false
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1050:3306 to: server-1053:3306, binlog.013040:58166363. GTID: false
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1211:3306 to: server-1053:3306, binlog.013040:58213297. GTID: false
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1102:3306 to: server-1053:3306, binlog.013040:58200559. GTID: false
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1198:3306 to: server-1053:3306, binlog.013040:58109194. GTID: false
2018-05-04 18:06:59 INFO ChangeMasterTo: Changed master on server-1133:3306 to: server-1053:3306, binlog.013040:58281447. GTID: false
2018-05-04 18:06:59 INFO Started slave on server-1134:3306
2018-05-04 18:06:59 INFO Started slave on server-1050:3306
2018-05-04 18:06:59 INFO Started slave on server-1211:3306
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1050:3306 cluster:server-2053:3306 message:matched server-1050:3306 below server-1053:3306
2018-05-04 18:06:59 INFO Started slave on server-1102:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1050:3306 cluster:server-2053:3306 message:maintenanceToken: 20635
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1134:3306 cluster:server-2053:3306 message:matched server-1134:3306 below server-1053:3306
2018-05-04 18:06:59 INFO Started slave on server-1133:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1134:3306 cluster:server-2053:3306 message:maintenanceToken: 20644
2018-05-04 18:06:59 INFO Started slave on server-1198:3306
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1211:3306 cluster:server-2053:3306 message:matched server-1211:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1102:3306 cluster:server-2053:3306 message:matched server-1102:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1211:3306 cluster:server-2053:3306 message:maintenanceToken: 20642
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1102:3306 cluster:server-2053:3306 message:maintenanceToken: 20639
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1133:3306 cluster:server-2053:3306 message:matched server-1133:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1133:3306 cluster:server-2053:3306 message:maintenanceToken: 20645
2018-05-04 18:06:59 INFO auditType:match-below instance:server-1198:3306 cluster:server-2053:3306 message:matched server-1198:3306 below server-1053:3306
2018-05-04 18:06:59 INFO auditType:end-maintenance instance:server-1198:3306 cluster:server-2053:3306 message:maintenanceToken: 20629
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1121:3306 to: server-1053:3306, binlog.013040:58286154. GTID: false
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1079:3306 to: server-1053:3306, binlog.013040:58283344. GTID: false
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1148:3306 to: server-1053:3306, binlog.013040:58200559. GTID: false
2018-05-04 18:07:00 INFO Started slave on server-1121:3306
2018-05-04 18:07:00 INFO Started slave on server-1079:3306
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1121:3306 cluster:server-2053:3306 message:matched server-1121:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1121:3306 cluster:server-2053:3306 message:maintenanceToken: 20647
2018-05-04 18:07:00 INFO Started slave on server-1148:3306
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1153:3306 to: server-1053:3306, binlog.013040:58127888. GTID: false
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1210:3306 to: server-1053:3306, binlog.013040:58157651. GTID: false
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1079:3306 cluster:server-2053:3306 message:matched server-1079:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1079:3306 cluster:server-2053:3306 message:maintenanceToken: 20646
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1017:3306 to: server-1053:3306, binlog.013040:58209134. GTID: false
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1020:3306 to: server-1053:3306, binlog.013040:58213297. GTID: false
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1148:3306 cluster:server-2053:3306 message:matched server-1148:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1148:3306 cluster:server-2053:3306 message:maintenanceToken: 20637
2018-05-04 18:07:00 INFO Started slave on server-1210:3306
2018-05-04 18:07:00 INFO Started slave on server-1153:3306
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1164:3306 to: server-1053:3306, binlog.013040:58213297. GTID: false
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1021:3306 to: server-1053:3306, binlog.013040:58192737. GTID: false
2018-05-04 18:07:00 INFO Started slave on server-1020:3306
2018-05-04 18:07:00 INFO Started slave on server-1017:3306
2018-05-04 18:07:00 INFO ChangeMasterTo: Changed master on server-1066:3306 to: server-1053:3306, binlog.013040:58147507. GTID: false
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1153:3306 cluster:server-2053:3306 message:matched server-1153:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1017:3306 cluster:server-2053:3306 message:matched server-1017:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1153:3306 cluster:server-2053:3306 message:maintenanceToken: 20630
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1017:3306 cluster:server-2053:3306 message:maintenanceToken: 20638
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1210:3306 cluster:server-2053:3306 message:matched server-1210:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1210:3306 cluster:server-2053:3306 message:maintenanceToken: 20633
2018-05-04 18:07:00 INFO Started slave on server-1164:3306
2018-05-04 18:07:00 INFO auditType:match-below instance:server-1020:3306 cluster:server-2053:3306 message:matched server-1020:3306 below server-1053:3306
2018-05-04 18:07:00 INFO auditType:end-maintenance instance:server-1020:3306 cluster:server-2053:3306 message:maintenanceToken: 20640
2018-05-04 18:07:00 INFO Started slave on server-1021:3306
2018-05-04 18:07:00 INFO Started slave on server-1066:3306
2018-05-04 18:07:01 INFO auditType:match-below instance:server-1164:3306 cluster:server-2053:3306 message:matched server-1164:3306 below server-1053:3306
2018-05-04 18:07:01 INFO auditType:end-maintenance instance:server-1164:3306 cluster:server-2053:3306 message:maintenanceToken: 20641
2018-05-04 18:07:01 INFO auditType:match-below instance:server-1021:3306 cluster:server-2053:3306 message:matched server-1021:3306 below server-1053:3306
2018-05-04 18:07:01 INFO auditType:end-maintenance instance:server-1021:3306 cluster:server-2053:3306 message:maintenanceToken: 20636
2018-05-04 18:07:01 INFO auditType:match-below instance:server-1066:3306 cluster:server-2053:3306 message:matched server-1066:3306 below server-1053:3306
2018-05-04 18:07:01 INFO auditType:end-maintenance instance:server-1066:3306 cluster:server-2053:3306 message:maintenanceToken: 20634
2018-05-04 18:07:01 INFO ChangeMasterTo: Changed master on server-1181:3306 to: server-1053:3306, binlog.013040:58246279. GTID: false
2018-05-04 18:07:01 INFO Started slave on server-1181:3306
2018-05-04 18:07:01 INFO auditType:match-below instance:server-1181:3306 cluster:server-2053:3306 message:matched server-1181:3306 below server-1053:3306
2018-05-04 18:07:01 INFO auditType:end-maintenance instance:server-1181:3306 cluster:server-2053:3306 message:maintenanceToken: 20643
2018-05-04 18:07:41 INFO ChangeMasterTo: Changed master on delay-1002:3306 to: server-1053:3306, binlog.012719:75090003. GTID: false
2018-05-04 18:07:41 INFO Started slave on delay-1002:3306
2018-05-04 18:07:41 INFO auditType:match-below instance:delay-1002:3306 cluster:server-2053:3306 message:matched delay-1002:3306 below server-1053:3306
2018-05-04 18:07:41 INFO auditType:end-maintenance instance:delay-1002:3306 cluster:server-2053:3306 message:maintenanceToken: 20631
2018-05-04 18:07:41 INFO auditType:multi-match-below-independent instance:server-1053:3306 cluster:server-2053:3306 message:matched 27/27 replicas below server-1053:3306 via Pseudo-GTID
2018-05-04 18:07:41 INFO auditType:relocate-replicas instance:server-1070:3306 cluster:server-2053:3306 message:relocated 27 replicas of server-1070:3306 below server-1053:3306
server-1169:3306
server-1117:3306
server-1155:3306
server-1187:3306
server-1208:3306
server-1041:3306
server-1217:3306
server-1207:3306
server-1065:3306
server-1050:3306
server-1134:3306
server-1211:3306
server-1102:3306
server-1133:3306
server-1198:3306
server-1121:3306
server-1079:3306
server-1148:3306
server-1153:3306
server-1017:3306
server-1210:3306
server-1020:3306
server-1164:3306
server-1021:3306
server-1066:3306
server-1181:3306
delay-1002:3306
$

So 48 seconds.

shlomi-noach commented 6 years ago

Thank you for testing!

bbuchalter commented 6 years ago

@sjmudd thank you for posting this issue. I'd like to resume the conversation regarding your proposed feature requests with a contribution I've got a green light to implement: configurable log output formatting.

We are currently having the same pain you've described in the original message, namely:

Since we use Splunk to search logs, we'd prefer to have the data formatted as JSON, but can't right now. Do you or @shlomi-noach have any objections to a PR for this feature? Any guidance on implementation details I should be mindful of? Thanks!

shlomi-noach commented 6 years ago

Please be advised that v3.0.12, https://github.com/github/orchestrator/releases/tag/v3.0.12, released just yesterday, includes much purging of logs, as per #555 .

bbuchalter commented 6 years ago

@shlomi-noach thanks for the quick response and releasing a new version with that PR. After looking more closely at those changes, they do not include the things we're filtering for. For example, when we want to isolate for actions taken during RecoverDeadCoMaster here are the filters we apply:

log != "*WARNING discoverInstance*"
log != "*ERROR ReadTopologyInstance*"
log != "*WARNING: NamedStopwatch.Stop*"
log != "*ERROR dial tcp*" log != "*martini*"
log != "*http: proxy error: context canceled*"
log != "*INFO auditType:register-candidate*" 
log != "*ERROR FlushNontrivialResolveCacheToDatabase()*"
log != "*waiting on a discoveryQueue*"
log != "*instance.SlaveLagSeconds < 0*"
log != "*INFO auditType*"
log != "*Discovered host*"
log != "*outdated keys*"
log != "*invalid connection*"
log != "*unexpected EOF*"
log != "*i/o timeout*"
log != "*resolve*"

To be clear, these filters may not all be needed for this particular use case, but these are "go to" filters when trying to eliminate noise. Our thinking is that if we were to use JSON logging output, we could then add metadata about the error that would make filtering more reasonable.

Thoughts?

shlomi-noach commented 6 years ago

Noting that most actions taken during recovery are especially audited, not just in the standard log, but in recovery log. This is found in /api/topology-recovery-steps/... (or though the web interface). These have been refined and still are, and my feeling is that they provide an overall good picture of the recovery process. These logs do not include lower-level actions such as change master to etc.

I'm hesitant to switch logging into JSON format, unless you propose a global variable that converts all logs to JSON, which is a reasonable approach.

bbuchalter commented 6 years ago

These logs do not include lower-level actions such as change master to etc.

We feel at this point in our Orchestrator integration, we need these lower-level logs. That's what's motivating us.

you propose a global variable that converts all logs to JSON, which is a reasonable approach

My intention would be to make log formatting configurable, yes. Nothing would need to change for existing users, or the defaults.

Does that sound OK in principal? Any thoughts on implementation details that come to mind that you care about?

bbuchalter commented 6 years ago

Any thoughts on implementation details that come to mind that you care about?

As I progressed I found that https://github.com/github/orchestrator/blob/master/vendor/github.com/openark/golib/log/log.go is not 100% compatible with the standard library logger (e.g. Errore), thus making any kind of drop-in replacement not immediately possible.

How would you feel about a PR that eliminates the use of these non-standard logging functions?

bbuchalter commented 6 years ago

I've made a naive first attempt at how something like this might be accomplished: https://github.com/github/orchestrator/compare/4f6ea856ce43abcd3951c22a8f2459cc86c5db63...bbuchalter:extract-non-standard-log-functions

See commit message for details and discussion. Thanks for your engagement!

shlomi-noach commented 6 years ago

Could you please elaborate on the log part? How does the logger you suggest in your above commit differ from the openark logger, and what is the significance of the change?

bbuchalter commented 6 years ago

@shlomi-noach forgive me for not being more explicit about the benefit of the change. I expect there are a lot of demands your time. Thanks for the quick responses!

The key feature of that library that's relevant to our discussion so far is mentioned here: https://github.com/sirupsen/logrus#formatters. In short, it provides many more options (out-of-the-box and custom) for formatting logs including JSON, fluentd, and logstash.

There are other benefits, as described in the README:

Does this make clear the differences and my motivation? What other questions do you have?

I'll try to preemptively answer one of my own questions which I raised in my commit message: how can we keep the Criticale (1 use), Errore (39 uses), and Fatale (6 uses) functions provided by https://github.com/openark/golib/blob/master/log/log.go while also switching to logrus?

Here's my reading of the implementation's features and how we would map them to logrus:

openark-logger feature logrus support work required
log levels https://github.com/sirupsen/logrus#level-logging Logrus doesn't support NOTICE or CRITICAL. Remap NOTICE messages to INFO. Remap CRITICAL messages to ERROR.
Include a stack trace if a flag is set Not present Pass the stacktrace in as a field associated with the log message.
Log formatting https://github.com/sirupsen/logrus#formatters Would need to decide if the text formatter provided by logrus would be too big of a change. Logrus's text formatter has a few nice features like color coding when printing to TTY, and logfmt compatibility otherwise. If too big of a change, then we could write a custom formatter for backwards compatibility.
syslog integration Available as an optional hook Not sure how, but it seems like the pieces are in place to continue support.

Thanks again for your time. Hope this RFC produces a path towards implementation that we're both excited about!

bbuchalter commented 6 years ago

@shlomi-noach wanted to ping this issue because I want to avoid wasted effort if you don't agree with my thinking above. Thanks for your time!

shlomi-noach commented 6 years ago

@bbuchalter your input is much appreciated! I'm sorry for the long delay. Could we, as first step, rewrite the openark/log.go code (inplace is fine, I'll extract that later) to use logerror or whichever implementation you see fit, such that there are no code changes under the go/ library?

bbuchalter commented 6 years ago

Thanks for the reply @shlomi-noach. Let me see if I understand correctly what you're proposing.

Instead of making any changes to the go/ library, we'll continue to use openark/log.go to act as the interface, but change it's underlying implementation such that it can be powered (and configured) with the more flexible logrus library I discussed above.

Did I understand you correctly?

shlomi-noach commented 6 years ago

@bbuchalter that is correct, if possible, please.