outbrain / orchestrator

MySQL replication topology manager/visualizer
Other
828 stars 168 forks source link

graceful-master-takeover command throwing error #242

Closed AshokJivani closed 7 years ago

AshokJivani commented 8 years ago

I am testing graceful-master-takeover functionality.

my Topology is: myserver1:3301 -> myserver2:3301 -> myserver3:3301 I am trying to gracefully promote myserver2 as a new master but getting the following error:

[root@orchadmin ~]$ orchestrator -c graceful-master-takeover -alias myserver1:3301
2016-08-12 11:05:23 FATAL Sanity check failure. It seems like the desginated instance myserver2:3301 does nto replicate from the master myserver1:3301. This error is strange. Panicking

Note: There are few typos in the output message: desginated should be replaced with designated and nto should be replaced with not

The error says that myserver2:3301 is not replicating from myserver1:3301 which is not true.

[root@orchadmin ~]$ orchestrator -c topology -i myserver1:3301
myserver1:3301 [0s,ok,5.6.28-76.1-log,rw,ROW,>>]
+ myserver2:3301 [0s,ok,5.6.28-76.1-log,ro,ROW,>>,GTID]
++ myserver3:3301 [0s,ok,5.6.28-76.1-log,ro,ROW,>>,GTID]

[root@orchadmin ~]$ orchestrator -c topology -i myserver2:3301
myserver1:3301 [0s,ok,5.6.28-76.1-log,rw,ROW,>>]
+ myserver2:3301 [0s,ok,5.6.28-76.1-log,ro,ROW,>>,GTID]
++ myserver3:3301 [0s,ok,5.6.28-76.1-log,ro,ROW,>>,GTID]

[root@orchadmin ~]$ orchestrator -c which-master -i myserver1:3301
[root@orchadmin ~]$ orchestrator -c which-slaves -i myserver1:3301
myserver2:3301

[root@orchadmin ~]$ orchestrator -c which-master -i myserver2:3301
myserver1:3301
[root@orchadmin ~]$ orchestrator -c which-slaves -i myserver2:3301
myserver3:3301

[root@orchadmin ~]$ orchestrator -c which-master -i myserver3:3301
myserver2:3301
[root@orchadmin ~]$ orchestrator -c which-slaves -i myserver3:3301

Please advise.

AshokJivani commented 8 years ago

Pinging @shlomi-noach in case you didn't get the issue notification. Thanks.

shlomi-noach commented 8 years ago

Thank you, will look into

shlomi-noach commented 8 years ago

pro tip: for multi line code prepend and append with three backticks; see your edited comment

AshokJivani commented 8 years ago

got it. Thanks.

shlomi-noach commented 8 years ago

What is your:

?

The error you got arises from: https://github.com/outbrain/orchestrator/blob/dcb6e1cc9560bb61742ef67279e8ad5972feb353/go/logic/topology_recovery.go#L1169

So a comparison of two InstanceKey values failed. I will submit a patch to print out the two, to get better visibility. But meanwhile, I'm wondering whether your servers talk to each other via IP, or hostname, short or FQDN?

AshokJivani commented 8 years ago

from config file:

HostnameResolveMethod: "none"
MySQLHostnameResolveMethod: "@@hostname"

The servers talk to each other (for ssh and all) by checking hostname in /etc/hosts or DNS lookup. However, we have enabled skip-name-resolve on mysql side.

AshokJivani commented 8 years ago

If i change HostnameResolveMethod from none to cname then graceful-master-takeover works as expected but having the values set to cname won't let me discover new instances.

shlomi-noach commented 8 years ago

Will you please try release 1.5.6 (https://github.com/outbrain/orchestrator/releases/tag/v1.5.6) and paste the new error message's output?

AshokJivani commented 8 years ago

replication topology: myserver1:3301 -> myserver2:3301 -> myserver3:3301 alias name: myserver1:3301

myserver1(192.168.1.1) myserver2(192.168.1.2) myserver3(192.168.1.3)

graceful-master-takeover output

[root@orchadmin tmp]$ orchestrator -c graceful-master-takeover -alias myserver1:3301
2016-08-17 11:05:15 FATAL Sanity check failure. It seems like the designated instance myserver2:3301 does not replicate from the master myserver1:3301 (designated instance's master key is 192.168.1.1:3301). This error is strange. Panicking
shlomi-noach commented 8 years ago

what does show slave status show on myserver2?

AshokJivani commented 8 years ago

show slave status on myserver2 shows

Master_Host: 192.168.1.1

I had a typo in my previous comment. It was showing 192.168.1.1 as designated instance's master key.

shlomi-noach commented 8 years ago

Makes more sense. OK so this is a resolve issue. This narrows the problem.

AshokJivani commented 7 years ago

Looks like the issue is happening because we use IP address for Master_Host parameter while configuring replication. Is there any way to configure Orchestrator to user IP address and not hostname while moving topologies but UI still shows hostnames?

shlomi-noach commented 7 years ago

@AshokJivani I apologize for this delay.

Is there any way to configure Orchestrator to user IP address and not hostname while moving topologies but UI still shows hostnames?

There is a way to do it. It's called a hostname-unresolve; it was created for managing floating VIPs but can be used in your case as well.

You will have to issue, for each host, the following:

orchestrator -c register-hostname-unresolve -i mysql.host.name --hostname=<ip address>

The thing to note is that you'll need to do so continuously; the ExpiryHostnameResolvesMinutes flag indicates the time after which such registration is invalidated. So you want to issue this in cron like every 10 minutes or 30 minutes.

There is a plan to have a "HostnameResolveMethod": "ip" config; however I'm not sure if the Web will present with a host name in such case.

Unrelated, I can solve you case in code, and remove a redundant extra-check.

shlomi-noach commented 7 years ago

Have (hopefully) fixed this downstream, will shortly push upstream

shlomi-noach commented 7 years ago

fixing code: https://github.com/outbrain/orchestrator/blob/858debf31c57a1a6835e43c6154c1a3db483e082/go/logic/topology_recovery.go#L1169-L1175

fixing release: https://github.com/outbrain/orchestrator/releases/tag/v1.5.7

shlomi-noach commented 7 years ago

@AshokJivani are you able to test this?

AshokJivani commented 7 years ago

@shlomi-noach It works as expected. I have tried using hostname and IP in CHANGE MASTER for the candidate master and in both cases graceful-master-failover worked.

Thank you.

shlomi-noach commented 7 years ago

Thank you. Sorry for the time it took.