openark / orchestrator

MySQL replication topology management and HA
Apache License 2.0
5.64k stars 933 forks source link

No replicas found for #720

Open miczard opened 5 years ago

miczard commented 5 years ago

Hallo All,

I want to share my issue during promote a slave afer the master failed.

The topology as follows

db4a:1111     [0s,ok,10.3.8-MariaDB-log,rw,ROW,>>,downtimed]
+ db6a:1111   [0s,ok,10.3.8-MariaDB-log,rw,ROW,>>,GTID]
  + db5a:1111 [0s,ok,10.3.8-MariaDB-log,rw,ROW,>>,GTID]

After I killed db4a, I tried manually promote db6a, but unfortunately, I got this error msg

No replicas found for db4a:1111

and my json config as follow

  "RecoverMasterClusterFilters": [
    "*"
  ],
  "RecoverIntermediateMasterClusterFilters": [
    "*"

  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "MasterFailoverDetachSlaveMasterHost": true,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,

is there something miss here ?

thanks

shlomi-noach commented 5 years ago

Could you please be more specific about how you tried to manually promote the replica, where you saw the error, and if possible, attache the log of the wehreabouts of the failover? Can you please clarify whether you were using GTID? Edit: I see you were.

miczard commented 5 years ago

Hallo Shlomi

What I did was click recover from web GUI and chose "Recover, try to promote db5a" and I saw as well the error msg in Web GUI. For the log, I attached the file since it quite messy if I pasted it here

Logfile.txt

many thanks

shlomi-noach commented 5 years ago

What I see in the log is mention of another recovery. Are you sure you have not initiated your manual recovery even while an automated recovery was taking place? I see the master is downtimed. Did you downtime it? I suspect not, and that the automated recovery did.

miczard commented 5 years ago

Hallo Shlomi,

Unfortunately, automated recovery did not happen, so I have to click the recovery button in web GUI.

Anyway .... on the top of everything, as you might see, I am using mariadb 10.3.8 and do you have maybe other info if that version is not compatible?

Why I am asking this, because I just tried as well with MySQL 8.013 and everythings (eg. automated recovery/failover) work as expected.

many thanks

miczard commented 5 years ago

Hallo Shlomi,

I guess, I found the root issue here, I am running semi-sync replications and maybe the automated recovery did not happen and promote new master also did not happen.

So, after I run as async replication, then automated recovery/failover work as expected.

please let me know if there is any limitation with semi-sync replications in MariaDB 10.3.xx thanks

shlomi-noach commented 5 years ago

@miczard I'm unaware of any limitation with MariaDB & semi-sync, though I'm not running such setup myself.

Anecdotally, 8.0.13 and 10.3.8 are anagrams. Had to mention this.

cloudufull commented 5 years ago

I had the same problem,when I use “ orchestrator -c graceful-master-takeover -i c150:3303” failed,but use “orchestrator-client -c graceful-master-takeover -i c150:3303” successed ~~,

shlomi-noach commented 5 years ago

@cloudufull unsure if it's the same problem. Are you running a raft setup? What's your config? What's the topology like?

cloudufull commented 5 years ago

Maybe I made a mistake. I just redeployed the orchestrator, and now everything seems normal again!:sweat_smile: