Open tangweichun opened 6 years ago
Yes ,
Just have a corosync cluster failover with consul agent + replication manager Corrosync is multi a ressources failover…
/svar
Le 17 sept. 2018 à 04:51, tangweichun notifications@github.com a écrit :
Hi~ This is a simple topology:
https://user-images.githubusercontent.com/29778335/45603053-9e45d600-ba5a-11e8-96cd-1d0bb6ebb235.png If the replication-manager is down,it's not a big deal.But if the consul client is crash or stopped,all application won't work,because it can't get the service name from consul server.
As i known,until now replication-manager still not support remote consul cluster,and when the local consul client is down,replication-manager will automate unregister services.
So,even if the consul and replication-manager is HA,but the consul cilent that conjunction with replication-manager is still single-point.If I use replication-manager to manager tens or hundreds or mysql replication,oneday the consul client is down,bomb!
Is there anything helpful for this issue?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/253, or mute the thread https://github.com/notifications/unsubscribe-auth/AC1RIAd0bBORruv6MDeGUJtISDyTk4Ekks5ubw4wgaJpZM4WrMTu.
Stéphane Varoqui, VP of Products Phone: +33 695-926-401, skype: svaroqui https://signal18.io/ https://signal18.io/
Thanks!it's too complex!
Re,
"until now replication-manager still not support remote consul cluster » Local agent VS Consul API will not change anything to the issue. The local agent is a member of the entire consul cluster so it’ exactly like talking to the full cluster with an api. If the agent can leave the cluster equivalent consul API will do as well.
In this case yes there is a divergence possible between DNS content and DB topology. Such topic can be address by comparing topology and DNS content to see if it really match is that what you worry about?
in 2.1 we do have an active/passive solution call arbitrator, that can decide who is the replication-manager active and who is passive , but it stay less advance compared to long existing heartbeat solution like corosync or opensvc where the heartbeat can be setup with stonith scripts or can spread around multiple solution like http to tcp to udp to share disk(opensvc only) .
/svar
Le 17 sept. 2018 à 08:32, tangweichun notifications@github.com a écrit :
Thanks!it's too complex!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/253#issuecomment-421902620, or mute the thread https://github.com/notifications/unsubscribe-auth/AC1RIBBGby1Xxa80YBu0vyUat6VpFp2Xks5ub0HzgaJpZM4WrMTu.
Stéphane Varoqui, VP of Products Phone: +33 695-926-401, skype: svaroqui https://signal18.io/ https://signal18.io/
Yes,I worry about the single point of failue,if the local consul agent is down,everything is down.It's terriable in production
I'm learning about corosync...
The single-point failure,i mean, replication-manager is alive but only local consul agent is down
Le 17 sept. 2018 à 09:26, tangweichun notifications@github.com a écrit :
Yes,I worry about the single point of failue,if the local consul agent is down,everything is down.It's terriable in production
Huuu why would everything be down , all others servers can still talk to the consul cluster and will keep same server resolution as previously witch is an issue as the DB topology can have been change by replication-manager So what would be more relevant is to not failover if consul is down correct ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/253#issuecomment-421913512, or mute the thread https://github.com/notifications/unsubscribe-auth/AC1RIP5JDNW4iWoylwcNdgSeetLEwmDuks5ub06vgaJpZM4WrMTu.
Stéphane Varoqui, VP of Products Phone: +33 695-926-401, skype: svaroqui https://signal18.io/ https://signal18.io/
normal status: [root@orabackup /root]$ date&&nslookup write_mysql57.service.consul Mon Sep 17 15:44:42 CST 2018 Server: 172.17.11.242 Address: 172.17.11.242#53
Name: write_mysql57.service.consul Address: 172.17.5.101
[root@orabackup /root]$ date&&nslookup read_mysql57.service.consul Mon Sep 17 15:44:56 CST 2018 Server: 172.17.11.242 Address: 172.17.11.242#53
Name: read_mysql57.service.consul Address: 172.17.5.201 Name: read_mysql57.service.consul Address: 172.17.11.242
after i stop the local consul agent: [root@orabackup /root]$ date&&nslookup write_mysql57.service.consul Mon Sep 17 15:49:19 CST 2018 Server: 172.17.11.242 Address: 172.17.11.242#53
** server can't find write_mysql57.service.consul: NXDOMAIN
[root@orabackup /root]$ date&&nslookup read_mysql57.service.consul Mon Sep 17 15:49:24 CST 2018 Server: 172.17.11.242 Address: 172.17.11.242#53
** server can't find read_mysql57.service.consul: NXDOMAIN
can't get the services from consul dns
As i known,until now replication-manager still not support remote consul cluster,and when the local consul client is down,replication-manager will automate unregister services.
It seems like will unregister services when stopping local consul agent
Le 17 sept. 2018 à 09:57, tangweichun notifications@github.com a écrit :
As i known,until now replication-manager still not support remote consul cluster,and when the local consul client is down,replication-manager will automate unregister services.
It seems like will unregister services when stopping local consul agent
That’s the local DNS resolution that is blocked but should not be from other nodes where DNS point to your consul cluster
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/253#issuecomment-421919772, or mute the thread https://github.com/notifications/unsubscribe-auth/AC1RIAW7SKHIJ6OBx-TGTWb6i3fTNM_hks5ub1YBgaJpZM4WrMTu.
Stéphane Varoqui, VP of Products Phone: +33 695-926-401, skype: svaroqui https://signal18.io/ https://signal18.io/
Hi~ This is a simple topology:
If the replication-manager is down,it's not a big deal.But if the consul client is crash or stopped,all application won't work,because it can't get the service name from consul server.
As i known,until now replication-manager still not support remote consul cluster,and when the local consul client is down,replication-manager will automate unregister services.
So,even if the consul and replication-manager is HA,but the consul cilent that conjunction with replication-manager is still single-point.If I use replication-manager to manager tens or hundreds or mysql replication,oneday the consul client is down,bomb!
Is there anything helpful for this issue?