signal18 / replication-manager

Signal 18 repman - Replication Manager for MySQL / MariaDB / Percona Server
https://signal18.io/products/srm
GNU General Public License v3.0
659 stars 168 forks source link

When the VIP is lost, what can I do to resolve this issue? #756

Open duguwo opened 4 months ago

duguwo commented 4 months ago

replication-manager version: v2.2.16 We use "ip addr add x.x.x.x dev eth0" to add a VIP. However, when the Linux network service is restarted, the VIP is lost. At this point, replication-manager does not trigger failover, causing MySQL database replication to function normally but the VIP is missing, resulting in business disruption. In this scenario, can we call an additional script via replication-manager to automatically add the VIP and restore business operations?

caffeinated92 commented 4 months ago

Hi, replication-manager not monitor VIP since we're not using that for database, you can do that in keepalived or other VIP tools. But can you explain further about the node architecture?

duguwo commented 4 months ago

We use a MySQL master-slave architecture with two nodes, one acting as the master and the other as the slave.

duguwo commented 4 months ago

When the VIP is lost, both MySQL master and slave nodes continue to function normally without triggering failover. If failover is initiated, replication-manager can effectively call additional scripts to perform the switch.

caffeinated92 commented 4 months ago

VIP is not monitored by replication manager. except the VIP is for database, so you will also use VIP to connect. As I remember you can hit script if you're using keepalived when VIP is lost. You can create your own script and trigger database failover using replication-manager-cli

svaroqui commented 4 months ago

It's a fair feature request we could simply consider the VIP as a proxy and use proxy plugin to monitor with connecting with mysql protocol , if the master is up but the VIP down we can call a script we already have something but not tested


--extproxy                                             External proxy can be used to specify a route manage with external scripts
--extproxy-address string                              Network address when route is manage via external script,  host:[port] format

```     -
duguwo commented 4 months ago

It's a fair feature request we could simply consider the VIP as a proxy and use proxy plugin to monitor with connecting with mysql protocol , if the master is up but the VIP down we can call a script we already have something but not tested

--extproxy                                             External proxy can be used to specify a route manage with external scripts
--extproxy-address string                              Network address when route is manage via external script,  host:[port] format

```     -

if I configure extproxy = true extproxy-address = "VIP:3306" but how can i call a script to add the VIP ?

svaroqui commented 4 months ago

I have coded the feature

./replication-manager-pro --config=etc/opensvc/cluster-api/cluster-demo/stephane.toml monitor --proxy-servers-state-change-script="./share/scripts/proxychangestate.sh" monitor

Please let me know how it works for you in conjonction to external proxy on next release

duguwo commented 4 months ago

I have coded the feature

./replication-manager-pro --config=etc/opensvc/cluster-api/cluster-demo/stephane.toml monitor --proxy-servers-state-change-script="./share/scripts/proxychangestate.sh" monitor

Please let me know how it works for you in conjonction to external proxy on next release

in /etc/replication-manager/cluster.d/wptest.toml ,I add the configure: extproxy = true extproxy-address = "172.20.2.250:30006"

"172.20.2.250" is the mysql's VIP .

now,I can't login in the web . The prompt says "Invalid username or password." By removing the two configurations mentioned above, I can log in and use it normally. The error log: time="2024-07-24 14:06:01" level=info msg="New proxy monitored: extproxy 172.20.2.250:30006" cluster=wptest time="2024-07-24 14:06:02" level=info msg="Replication-Manager started in daemon mode" version=v2.2.16 time="2024-07-24 14:06:02" level=info msg="No existing password encryption scheme" error="Key file does not exist" time="2024-07-24 14:06:02" level=info msg="Failover in automatic mode" cluster=wptest time="2024-07-24 14:06:02" level=info msg="No existing password encryption scheme in LoadAPIUsers" cluster=wptest time="2024-07-24 14:06:02" level=info msg="No SSL certificate provided using insecured from /usr/share/replication-manager/server.crt"

svaroqui commented 4 months ago

Please use version 2.3.42 for using that new feature we will not back port it unless you are under support

duguwo commented 4 months ago

I have coded the feature

./replication-manager-pro --config=etc/opensvc/cluster-api/cluster-demo/stephane.toml monitor --proxy-servers-state-change-script="./share/scripts/proxychangestate.sh" monitor

Please let me know how it works for you in conjonction to external proxy on next release

Thank you for your response. I have upgraded replication-manager-osc to version 2.3.42. Could you provide information on how to use the script corresponding to the --proxy-servers-state-change-script parameter? I couldn't find relevant information in the documentation.

duguwo commented 4 months ago

I have coded the feature ./replication-manager-pro --config=etc/opensvc/cluster-api/cluster-demo/stephane.toml monitor --proxy-servers-state-change-script="./share/scripts/proxychangestate.sh" monitor Please let me know how it works for you in conjonction to external proxy on next release

Thank you for your response. I have upgraded replication-manager-osc to version 2.3.42. Could you provide information on how to use the script corresponding to the --proxy-servers-state-change-script parameter? I couldn't find relevant information in the documentation.

ADD,I ran the following script:replication-manager-osc --config=/etc/replication-manager/cluster.d/wptest.toml monitor --proxy-servers-state-change-script="/usr/local/shell/vip_failover_post_wptest.sh" monitor and received the following error: WARN[2024-07-30T17:29:10+08:00] Empty credential do not decrypt key: vault-token cluster=none module=config type=log WARN[2024-07-30T17:29:10+08:00] Replication manager started with version: v2.3.42 channel=StdOut cluster=wptest module=general type=alert INFO[2024-07-30T17:29:10+08:00] Server 172.20.2.193:30006 previous state set to: Suspect cluster=wptest module=general type=log INFO[2024-07-30T17:29:10+08:00] Server 172.20.2.194:30006 previous state set to: Suspect cluster=wptest module=general type=log INFO[2024-07-30T17:29:10+08:00] Server 172.20.2.250:30006 previous state set to: Suspect cluster=wptest module=general type=log INFO[2024-07-30T17:29:10+08:00] New proxy monitored extproxy: 172.20.2.250:30006 cluster=wptest module=general type=log WARN[2024-07-30T17:29:10+08:00] Default users still use default password. Please change the credentials for users: (admin) cluster=wptest module=general type=log panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0xceab5d]

goroutine 1 [running]: github.com/spf13/viper.(Viper).SetConfigType(...) /var/jenkins_home/workspace/go/pkg/mod/github.com/spf13/viper@v1.4.0/viper.go:1790 github.com/signal18/replication-manager/config.(Config).ReadCloud18Config(0xc0005b88a0, 0xc000142b00) /var/jenkins_home/workspace/go/src/github.com/signal18/replication-manager/config/config.go:1949 +0x7d github.com/signal18/replication-manager/server.(ReplicationManager).Run(0xc0005b8508) /var/jenkins_home/workspace/go/src/github.com/signal18/replication-manager/server/server.go:1604 +0x2273 github.com/signal18/replication-manager/server.init.func6(0x4b475a0, {0x29c222a?, 0x4?, 0x29c1fd6?}) /var/jenkins_home/workspace/go/src/github.com/signal18/replication-manager/server/server_monitor.go:85 +0x136 github.com/spf13/cobra.(Command).execute(0x4b475a0, {0xc000500fc0, 0x3, 0x3}) /var/jenkins_home/workspace/go/pkg/mod/github.com/spf13/cobra@v0.0.6/command.go:844 +0x684 github.com/spf13/cobra.(Command).ExecuteC(0x4b46dc0) /var/jenkins_home/workspace/go/pkg/mod/github.com/spf13/cobra@v0.0.6/command.go:945 +0x369 github.com/spf13/cobra.(Command).Execute(...) /var/jenkins_home/workspace/go/pkg/mod/github.com/spf13/cobra@v0.0.6/command.go:885 github.com/signal18/replication-manager/server.Execute(...) /var/jenkins_home/workspace/go/src/github.com/signal18/replication-manager/server/server_cmd.go:103 main.main() /var/jenkins_home/workspace/go/src/github.com/signal18/replication-manager/main_server.go:24 +0x1a

caffeinated92 commented 4 months ago

replication-manager-osc --config=/etc/replication-manager/cluster.d/wptest.toml monitor --proxy-servers-state-change-script="/usr/local/shell/vip_failover_post_wptest.sh" monitor

please only use monitor once

duguwo commented 4 months ago

replication-manager-osc --config=/etc/replication-manager/cluster.d/wptest.toml monitor --proxy-servers-state-change-script="/usr/local/shell/vip_failover_post_wptest.sh" monitor

please only use monitor once

I ran the following script: replication-manager-osc --config=/etc/replication-manager/cluster.d/wptest.toml --proxy-servers-state-change-script="/usr/local/shell/vip_failover_post_wptest.sh" monitor
and received the same error

caffeinated92 commented 4 months ago

replication manager need one main config [default] I think it should be etc/replication-manager/config.toml

duguwo commented 3 months ago

replication manager need one main config [default] I think it should be etc/replication-manager/config.toml

My replication manager is running, I followed svaroqui's advice. And i configure extproxy = true extproxy-address = "VIP:3306" but how can i call a script to add the VIP ?

svaroqui commented 2 months ago

Would it help a one shoot install of the vip from the gui or api like using provision on the extproxy ?

Using extproxy and the VIP it will be monitored and replication-manager call the scripts proxy-servers-state-change-script when the VIP does not point to a valid mysql
out, err := exec.Command(cluster.Conf.PRXServersChangeStateScript, cluster.Name, srv.GetHost(), srv.GetPort(), newState, oldState, master.State).CombinedOutput()