signal18 / replication-manager

Signal 18 repman - Replication Manager for MySQL / MariaDB / Percona Server
https://signal18.io/products/srm
GNU General Public License v3.0
658 stars 168 forks source link

Version 1.0.2: ERR00012 Could not autodetect a master #157

Open Srijstha opened 7 years ago

Srijstha commented 7 years ago

Hello,

I have problem with failover, which fails to elect a master. I tried the following script in MaxScale configuration:

/usr/bin/replication-manager failover --user root:rootpass --rpluser repluser:replpass --hosts $INITIATOR,$NODELIST

Here is a screenshot of list of servers and maxscale service status after the master was down: screenshot

The issue is similar to #9, but experiencing in the new version as well!

Any hints?

Thanks!

tanji commented 7 years ago

Hi Srijstha,

this mode of operation is deprecated. If you want MaxScale and Replication Manager to play nice together, you should use Replication Manager in daemon mode and let him drive MaxScale through a failover.

You can follow the instructions here:

https://github.com/tanji/replication-manager#using-maxscale

If you have any questions please let me know.

Srijstha commented 7 years ago

Hi tanji,

Many thanks for your response. Actually, I tried the Replication Manager first from command line without script in MaxScale with the command: /usr/bin/replication-manager failover --user root:rootpass --rpluser repluser:replpass --hosts dbm,dbs-1,dbs-2 -verbose

And this also has similar problem. Here is the output from the command run:

2017/04/04 13:00:43 INFO : No existing password encryption scheme: Key file does not exist
2017/04/04 13:00:43 WARN : Could not create state file
2017/04/04 13:00:43 WARN : Could not read values from state file: invalid argument
2017/04/04 13:00:43 ERROR: ERR00012 Could not autodetect a master
2017/04/04 13:00:43 INFO : Starting master switch
2017/04/04 13:00:43 INFO : Electing a new master
2017/04/04 13:00:43 ERROR: No candidates found

So, it doesn't look like the problem of MaxScale and MRM integration. I will look into the different modes of integration. In the mean time, I appreciate if you could provide some hints on making MRM works from command line.

Srijstha commented 7 years ago

I tried to use daemon mode using the command: replication-manager --config=/etc/config.toml --config-group=Test_Maxscale --verbose

The config file contains [Test_Maxscale] group with parameters:

[Test_Maxscale]
title = "TestMaxscale"
hosts = "dbm,dbs-1,dbs-2"
prefmaster = "dbm"
user = "root:rootpass"
rpluser = "rpluser:rplpass"
interactive = true
maxscale = true
maxscale-monitor = false
maxscale-maxinfo-port = 3307
maxscale-get-info-method = "maxadmin"
maxscale-host = "127.0.0.1"
maxscale-port = 3307
maxscale-user = "maxscaleuser"
maxscale-pass = "maxscalepass"
maxscale-write-port = 4007
maxscale-read-port = 4008
maxscale-read-write-port = 4006
maxscale-binlog = false
maxscale-binlog-port = 3305
test = true

However, it gives an error:

2017/04/05 11:18:31 INFO : Using configuration group Test_Maxscale
2017/04/05 11:18:31 ERROR: Could not parse configuration group Test_Maxscale

It gives the same error whatever configuration group is used. Did I miss anything?

tanji commented 7 years ago

Could you please use --log-level=3 in place of --verbose? There are many reasons for which the slaves could be uneligible for failover. Then, please include the logs. Thanks!

Srijstha commented 7 years ago

The command doesn't seem to create any log file. I also tried with --logfile parameter, but without luck!

tanji commented 7 years ago

Srijstha,

Can you paste the exact commands you have been running and their output?

Thanks!

Srijstha commented 7 years ago

Here is the command and output:

$replication-manager monitor --config=/etc/config.toml  --config-group=Test_Maxscale --daemon --log-level=3
2017/04/05 12:40:37 INFO : Using configuration group Test_Maxscale
2017/04/05 12:40:37 ERROR: Could not parse configuration group Test_Maxscale
tanji commented 7 years ago

It works just right for me, so you must have been doing something wrong?

2017/04/05 18:53:36 INFO : Using config file: /etc/config.toml
2017/04/05 18:53:36 INFO : Using configuration group Test_Maxscale
2017/04/05 18:53:36 INFO : No existing password encryption scheme: Key file does not exist
2017/04/05 18:53:36 INFO : replication-manager version 1.0.2 started in daemon mode
2017/04/05 18:53:36 INFO : Monitor started in manual mode
2017/04/05 18:53:36 ERROR: Could not open connection to server dbm : ERROR: DNS resolution error for host dbm

Could you give me the output of replication-manager version? As you see above I'm using version 1.0.2 with some success.

Srijstha commented 7 years ago

That's strange! I am also using version 1.0.2:

$ replication-manager version
2017/04/05 12:56:10 INFO : Using config file: /etc/replication-manager/config.toml
Replication Manager 1.0.2 for MariaDB 10.x Series
Full Version:  1.0.2-1-g8faf64d
Build Time:  2017-02-02T13:23:14+0100

Could that be the problem of the config file? I am using the sample config file from: https://github.com/tanji/replication-manager/blob/2f13bfe3b1f24d51c16d3b76cd61bd11ab7faea6/etc/config.toml.sample.bestpractice and just modified the config group!

tanji commented 7 years ago

I have no idea. I just copied the text you pasted me above... If this is different from the sample config file, I could not know :)

Feel free to send me the full config file at guillaume@signal18.io if you do not feel like publishing it online.

tanji commented 7 years ago

By the way, the sample config file is for Replication Manager 1.1, so it might not work with 1.0

Srijstha commented 7 years ago

Ok I figured it out, it was the problem of wrong path to the config file! Many thanks for your help and prompt answers.

I am still trying to make it work for master failover and rejoin. I will ask you again if have other questions. By the way, it would be great if you could provide me a sample config file for 1.0

tanji commented 7 years ago

The sample file should be in the 1.0.2 tarball that you can get from the releases page, or if you have installed a RPM or DEB package it's in /etc/replication-manager/

Srijstha commented 7 years ago

Great, thanks. I will first try with 1.1:)

Srijstha commented 7 years ago

Hi tanji,

Some progress now:) I am getting the following error now:

INFO[2017-04-05T15:33:58+02:00] [Test_Maxscale] DEBUG: Entering topology detection
INFO[2017-04-05T15:33:58+02:00] [Test_Maxscale] ERROR: Could not connect to MaxScale:Incorrect maxscale protocol negotiation

What could be the problem here?

svaroqui commented 7 years ago

You probably need to be able to connect to maxadmin port with user password define in maxscale, can you check with telnet Tx

Le 5 avr. 2017 09:38, "Srijstha" notifications@github.com a écrit :

Hi tanji,

Some progress now:) I am getting the following error now:

INFO[2017-04-05T15:33:58+02:00] [Test_Maxscale] DEBUG: Entering topology detection INFO[2017-04-05T15:33:58+02:00] [Test_Maxscale] ERROR: Could not connect to MaxScale:Incorrect maxscale protocol negotiation

What could be the problem here?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tanji/replication-manager/issues/157#issuecomment-291863765, or mute the thread https://github.com/notifications/unsubscribe-auth/AC1RIGcVVK7IyiSbKJkheNRqOEv6JFAHks5rs5lEgaJpZM4Mxzjx .

tanji commented 7 years ago

@Srijstha , you need to connect to the maxscale administration port. Sorry if that is unclear in the docs... I assume that 3307 is the mysql router port. So, you need to add this in your maxscale config (if not already present):

[CLI Listener]
type=listener
service=CLI
protocol=maxscaled
port=6603
socket=default
Srijstha commented 7 years ago

@tanji, I did have the listener and used the port 6603 as well but without success. It gives an ERROR: Could not connect to MaxScale:Connection failed to address maxscale:6603

As there are many ports referred to in the config.toml.sample, it is not so clear which port corresponds to which port in maxscale config. I hope you have looked at the config files I sent to you by email. If it would be time consuming and not easy to debug from those files, it would be great if you could share working example config files (config.toml and maxscale.cnf files with minimal settings) for a simple master-slave database setup with one master and two slaves. Many thanks!