openark / orchestrator

MySQL replication topology management and HA
Apache License 2.0
5.62k stars 929 forks source link

Orchestrator shows master unreachable. however i see master and all slave #1026

Open dipeshacharya opened 4 years ago

dipeshacharya commented 4 years ago

Hi All, we have 4 Mariadb ( 2 on each data center) database which replication is managed by orchestrator. we have 3 orchestrator running ( one on each Data center and one on AWS). we recently had a major failure where one of the data center where we had our primary mariadb db got rebooted, Orchestrator did a automatic failover and then move the primary database to another datacenter. Howevver when i am trying to see the health of replication using below command i see below and have a doubt why it says unreachable master. i can do a telnet and it connects as well.

Please let me know if i need to worry about anything, how do i fix the unreachable master status cause i dont see any network issue. should i go ahead with graceful failover to the node where it use to be before.?

[root@ ~]# orchestrator-client -c replication-analysis 12.427.6.122:3306 (cluster 12.427.6.122:3306): UnreachableMaster

[root@d1vcmpmgrpl01 ~]# oc-prod -c topology -i 12.427.6.122l:3306 12.427.6.122:3306 [0s,ok,10.2.22-MariaDB-log,rw,ROW,>>]

2019-11-19 23:29:22 INFO auditType:emergently-read-topology-instance instance:10.227.6.122:3306 cluster:12.447.6.122:3306 message:UnreachableMaster
2019-11-19 23:29:22 DEBUG raft leader is 12.427.6.124:10008 (this host); state: Leader
2019-11-19 23:29:24 INFO auditType:emergently-read-topology-instance 
2019-11-19 23:29:27 DEBUG raft leader is 12.427.6.124:10008 (this host); state: Leader
2019-11-19 23:29:28 DEBUG orchestrator/raft: applying command 4126137: request-health-report
[martini] Started GET /api/raft-follower-health-report/6122a715/12.427.6.124/12.427.6.124 for 12.427.6.124:53623
[martini] Completed 200 OK in 1.21532ms
[martini] Started GET /api/raft-follower-health-report/6122a715/12.527.6.136/12.627.6.136 for 12.627.6.136:42494
[martini] Completed 200 OK in 908.149µs
[martini] Started GET /api/raft-follower-health-report/6122a715/12.430.193.213/13.430.193.213 for 12.430.193.213:46278
[martini] Completed 200 OK in 1.666629ms
shlomi-noach commented 4 years ago

The log you pasted seems to be very short; is there no error message from orchestrator about failing to connect to the master? Is there a PRIVILEGES issue? A credentials error? A firewall issue?

You being able to connect via telnet does not mean orchestrator is able to connect to the master.

What I would try: SSH into the orchestrator leader node, from there run mysql -h <master> -u... -p... and use `orchestrator's credentials, see what happens.