pengmingde / mysql-master-ha

Automatically exported from code.google.com/p/mysql-master-ha
0 stars 0 forks source link

possible bug in masterha_secondary_check #39

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

There is simple mysql-slave replication.
For test:
172.16.50.11 - master
172.16.50.14 - slave

I am test functionality of secondary_check_script
In mha conf i added 
secondary_check_script = masterha_secondary_check -s 172.16.50.14

On 172.16.50.11 i execute masterha_master_monitor and then test and look how 
fail-over will doing and how masterha_secondary_check will work

masterha_master_monitor --conf=/etc/mha_manager/app1.cnf

After starting manager, in other terminal i shutdown master 172.16.50.11 

but unfortunately i got next error messages

#############
#############
Fri Nov 16 16:30:10 2012 - [info]
172.16.50.11 (current master)
 +--172.16.50.14

Fri Nov 16 16:30:10 2012 - [warning] master_ip_failover_script is not defined.
Fri Nov 16 16:30:10 2012 - [warning] shutdown_script is not defined.
Fri Nov 16 16:30:10 2012 - [info] Set master ping interval 3 seconds.
Fri Nov 16 16:30:10 2012 - [info] Set secondary check script: 
masterha_secondary_check -s 172.16.50.14
Fri Nov 16 16:30:10 2012 - [info] Starting ping health check on 
172.16.50.11(172.16.50.11:3306)..
Fri Nov 16 16:30:10 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL 
doesn't respond..
Fri Nov 16 16:30:22 2012 - [warning] Got error on MySQL select ping: 2006 
(MySQL server has gone away)
Fri Nov 16 16:30:22 2012 - [info] Executing SSH check script: save_binary_logs 
--command=test --start_pos=4 --binlog_dir=/home/mysqldata/ 
--output_file=/home/mha_manager_data/app1/save_binary_logs_test 
--manager_version=0.54 --binlog_prefix=mysql-bin
Fri Nov 16 16:30:22 2012 - [info] Executing seconary network check script: 
masterha_secondary_check -s 172.16.50.14  --user=mha4mysql  
--master_host=172.16.50.11  --master_ip=172.16.50.11  --master_port=3306

command-line line 0: invalid time value.
Monitoring server 172.16.50.14 is NOT reachable!
Fri Nov 16 16:30:22 2012 - [warning] At least one of monitoring servers is not 
reachable from this script. This is likely network problem. Failover should not 
happen.
  Creating /home/mha_manager_data/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /home/mysqldata/, up to mysql-bin.000016
Fri Nov 16 16:30:23 2012 - [info] HealthCheck: SSH to 172.16.50.11 is reachable.
#############
#############

well, i start debug it
Found in masterha_secondary_check at line 78 place where $comand construct

i write "print $command" and get constructed @command

ssh -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes 
-o ConnectTimeout=VAR_CONNECT_TIMEOUT -p 22 mha4mysql@172.16.50.14 "perl -e 
\"use IO::Socket::INET; my \\\$sock = IO::Socket::INET->new(PeerAddr => 
\\\"172.16.50.11\\\", PeerPort=> 3306, Proto =>'tcp', Timeout => 4); 
if(\\\$sock) { close(\\\$sock); exit 3; } exit 0;\" "

For some reason there is VAR_CONNECT_TIMEOUT variable exists here.
If i comment(or erace) place with VAR_CONNECT_TIMEOUT, then it works and could 
connect to 172.16.50.14 and mha_manager correctly can use this check in work

Is it bug or i forgot something configure in cfg?

Original issue reported on code.google.com by obric...@balakam.com on 16 Nov 2012 at 1:09

GoogleCodeExporter commented 9 years ago
Thanks for the report. This is a regression bug of the recent changes in the 
github dev tree and this doesn't repeat in stable releases (~0.53). I'll write 
a patch soon.

Original comment by Yoshinor...@gmail.com on 16 Nov 2012 at 2:53

GoogleCodeExporter commented 9 years ago
Fix is committed on the recent github tree.

Original comment by Yoshinor...@gmail.com on 16 Nov 2012 at 3:33