matthewbogner / mysql-master-ha

Automatically exported from code.google.com/p/mysql-master-ha
1 stars 0 forks source link

Failed to deactivate master IP with return code 1:0 #76

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.set MHA (3node)
2.Power off Master node

What is the expected output? What do you see instead?

I want see Failover Succeed.

What version of the product are you using? On what operating system?

Manager 0.55, Node 0.54

Please provide any additional information below.

here is my log.
...
...
Sun Feb 16 19:21:44 2014 - [info] Executing master IP deactivatation script:
Sun Feb 16 19:21:44 2014 - [info]   /usr/local/bin/master_ip_failover 
--orig_master_host=192.168.1.71 --orig_master_ip=192.168.1.71 
--orig_master_port=3306 --command=stop 
Got Error: Use of uninitialized value $ssh_user in concatenation (.) or string 
at /usr/local/bin/master_ip_failover line 136.

IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 
10.10.10.11/24===

Disabling the VIP on old master: 192.168.1.71 
Sun Feb 16 19:21:44 2014 - 
[error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln340] Failed to 
deactivate master IP with return code 1:0
Sun Feb 16 19:21:44 2014 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, 
ln178] Got ERROR:  at /usr/local/bin/masterha_manager line 65
Sun Feb 16 19:21:44 2014 - [info] 

----- Failover Report -----

ksh: MySQL Master failover 192.168.1.71

Master 192.168.1.71 is down!

Check MHA Manager logs at NEWm:/var/log/masterha/ksh/ksh.log for details.

Started automated(non-interactive) failover.
Failed to deactivate master IP with return code 1:0
Got Error so couldn't continue failover from here.

----------------------------------------------------
What is wrong??? 

Original issue reported on code.google.com by au...@naver.com on 17 Feb 2014 at 8:46

GoogleCodeExporter commented 8 years ago
This is my all log
------------------------
Sun Feb 16 19:21:34 2014 - [warning] Got timeout on MySQL Ping child process 
and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 384.
Sun Feb 16 19:21:34 2014 - [info] Executing SSH check script: save_binary_logs 
--command=test --start_pos=4 --binlog_dir=/var/lib/mysql/ 
--output_file=/var/log/masterha/ksh/save_binary_logs_test 
--manager_version=0.55 --binlog_prefix=mysql-bin
Sun Feb 16 19:21:37 2014 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '192.168.1.71' (4))
Sun Feb 16 19:21:37 2014 - [warning] Connection failed 1 time(s)..
Sun Feb 16 19:21:39 2014 - [warning] HealthCheck: Got timeout on checking SSH 
connection to 192.168.1.71! at /usr/local/share/perl5/MHA/HealthCheck.pm line 
298.
Sun Feb 16 19:21:40 2014 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '192.168.1.71' (4))
Sun Feb 16 19:21:40 2014 - [warning] Connection failed 2 time(s)..
Sun Feb 16 19:21:43 2014 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '192.168.1.71' (4))
Sun Feb 16 19:21:43 2014 - [warning] Connection failed 3 time(s)..
Sun Feb 16 19:21:43 2014 - [warning] Master is not reachable from health 
checker!
Sun Feb 16 19:21:43 2014 - [warning] Master 192.168.1.71(192.168.1.71:3306) is 
not reachable!
Sun Feb 16 19:21:43 2014 - [warning] SSH is NOT reachable.
Sun Feb 16 19:21:43 2014 - [info] Connecting to a master server failed. Reading 
configuration file /etc/masterha_default.cnf and /etc/ksh.cnf again, and trying 
to connect to all servers to check server status..
Sun Feb 16 19:21:43 2014 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Sun Feb 16 19:21:43 2014 - [info] Reading application default configurations 
from /etc/ksh.cnf..
Sun Feb 16 19:21:43 2014 - [info] Reading server configurations from 
/etc/ksh.cnf..
Sun Feb 16 19:21:43 2014 - [info] Dead Servers:
Sun Feb 16 19:21:43 2014 - [info]   192.168.1.71(192.168.1.71:3306)
Sun Feb 16 19:21:43 2014 - [info] Alive Servers:
Sun Feb 16 19:21:43 2014 - [info]   192.168.1.72(192.168.1.72:3306)
Sun Feb 16 19:21:43 2014 - [info]   192.168.1.73(192.168.1.73:3306)
Sun Feb 16 19:21:43 2014 - [info] Alive Slaves:
Sun Feb 16 19:21:43 2014 - [info]   192.168.1.72(192.168.1.72:3306)  
Version=5.5.35-enterprise-commercial-advanced-log (oldest major version between 
slaves) log-bin:enabled
Sun Feb 16 19:21:43 2014 - [info]     Replicating from 
192.168.1.71(192.168.1.71:3306)
Sun Feb 16 19:21:43 2014 - [info]   192.168.1.73(192.168.1.73:3306)  
Version=5.5.35-enterprise-commercial-advanced-log (oldest major version between 
slaves) log-bin:enabled
Sun Feb 16 19:21:43 2014 - [info]     Replicating from 
192.168.1.71(192.168.1.71:3306)
Sun Feb 16 19:21:43 2014 - [info] Checking slave configurations..
Sun Feb 16 19:21:43 2014 - [info]  read_only=1 is not set on slave 
192.168.1.72(192.168.1.72:3306).
Sun Feb 16 19:21:43 2014 - [warning]  relay_log_purge=0 is not set on slave 
192.168.1.72(192.168.1.72:3306).
Sun Feb 16 19:21:43 2014 - [info]  read_only=1 is not set on slave 
192.168.1.73(192.168.1.73:3306).
Sun Feb 16 19:21:43 2014 - [warning]  relay_log_purge=0 is not set on slave 
192.168.1.73(192.168.1.73:3306).
Sun Feb 16 19:21:43 2014 - [info] Checking replication filtering settings..
Sun Feb 16 19:21:43 2014 - [info]  Replication filtering check ok.
Sun Feb 16 19:21:43 2014 - [info] Master is down!
Sun Feb 16 19:21:43 2014 - [info] Terminating monitoring script.
Sun Feb 16 19:21:43 2014 - [info] Got exit code 20 (Master dead).
Sun Feb 16 19:21:43 2014 - [info] MHA::MasterFailover version 0.55.
Sun Feb 16 19:21:43 2014 - [info] Starting master failover.
Sun Feb 16 19:21:43 2014 - [info] 
Sun Feb 16 19:21:43 2014 - [info] * Phase 1: Configuration Check Phase..
Sun Feb 16 19:21:43 2014 - [info] 
Sun Feb 16 19:21:43 2014 - [info] Dead Servers:
Sun Feb 16 19:21:43 2014 - [info]   192.168.1.71(192.168.1.71:3306)
Sun Feb 16 19:21:43 2014 - [info] Checking master reachability via mysql(double 
check)..
Sun Feb 16 19:21:44 2014 - [info]  ok.
Sun Feb 16 19:21:44 2014 - [info] Alive Servers:
Sun Feb 16 19:21:44 2014 - [info]   192.168.1.72(192.168.1.72:3306)
Sun Feb 16 19:21:44 2014 - [info]   192.168.1.73(192.168.1.73:3306)
Sun Feb 16 19:21:44 2014 - [info] Alive Slaves:
Sun Feb 16 19:21:44 2014 - [info]   192.168.1.72(192.168.1.72:3306)  
Version=5.5.35-enterprise-commercial-advanced-log (oldest major version between 
slaves) log-bin:enabled
Sun Feb 16 19:21:44 2014 - [info]     Replicating from 
192.168.1.71(192.168.1.71:3306)
Sun Feb 16 19:21:44 2014 - [info]   192.168.1.73(192.168.1.73:3306)  
Version=5.5.35-enterprise-commercial-Sun Feb 16 18:37:26 2014 - [info] 
MHA::MasterMonitor version 0.55.
Sun Feb 16 18:37:26 2014 - [info] Dead Servers:
Sun Feb 16 18:37:26 2014 - [info] Alive Servers:
Sun Feb 16 18:37:26 2014 - [info]   192.168.1.71(192.168.1.71:3306)
Sun Feb 16 18:37:26 2014 - [info]   192.168.1.72(192.168.1.72:3306)
Sun Feb 16 18:37:26 2014 - [info]   192.168.1.73(192.168.1.73:3306)
Sun Feb 16 18:37:26 2014 - [info] Alive Slaves:
Sun Feb 16 18:37:26 2014 - [info]   192.168.1.72(192.168.1.72:3306)  
Version=5.5.35-enterprise-commercial-advanced-log (oldest major version between 
slaves) log-bin:enabled
Sun Feb 16 18:37:26 2014 - [info]     Replicating from 
192.168.1.71(192.168.1.71:3306)
Sun Feb 16 18:37:26 2014 - [info]   192.168.1.73(192.168.1.73:3306)  
Version=5.5.35-enterprise-commercial-advanced-log (oldest major version between 
slaves) log-bin:enabled
Sun Feb 16 18:37:26 2014 - [info]     Replicating from 
192.168.1.71(192.168.1.71:3306)
Sun Feb 16 18:37:26 2014 - [info] Current Alive Master: 
192.168.1.71(192.168.1.71:3306)
Sun Feb 16 18:37:26 2014 - [info] Checking slave configurations..
Sun Feb 16 18:37:26 2014 - [info]  read_only=1 is not set on slave 
192.168.1.72(192.168.1.72:3306).
Sun Feb 16 18:37:26 2014 - [warning]  relay_log_purge=0 is not set on slave 
192.168.1.72(192.168.1.72:3306).
Sun Feb 16 18:37:26 2014 - [info]  read_only=1 is not set on slave 
192.168.1.73(192.168.1.73:3306).
Sun Feb 16 18:37:26 2014 - [warning]  relay_log_purge=0 is not set on slave 
192.168.1.73(192.168.1.73:3306).
Sun Feb 16 18:37:26 2014 - [info] Checking replication filtering settings..
Sun Feb 16 18:37:26 2014 - [info]  binlog_do_db= , binlog_ignore_db=
Sun Feb 16 18:37:26 2014 - [info]  Replication filtering check ok.
Sun Feb 16 18:37:26 2014 - [info] Starting SSH connection tests..
Sun Feb 16 18:39:14 2014 - [info] MHA::MasterMonitor version 0.55.
Sun Feb 16 18:39:15 2014 - [warning] 
/var/log/masterha/ksh/ksh.master_status.health already exists. You might have 
killed manager with SIGKILL(-9), may run two or more monitoring process for the 
same application, or use the same working directory. Check for details, and 
consider setting --workdir separately.
Sun Feb 16 18:39:15 2014 - [info] Dead Servers:
Sun Feb 16 18:39:15 2014 - [info] Alive Servers:
Sun Feb 16 18:39:15 2014 - [info]   192.168.1.71(192.168.1.71:3306)
Sun Feb 16 18:39:15 2014 - [info]   192.168.1.72(192.168.1.72:3306)
Sun Feb 16 18:39:15 2014 - [info]   192.168.1.73(192.168.1.73:3306)
Sun Feb 16 18:39:15 2014 - [info] Alive Slaves:
Sun Feb 16 18:39:15 2014 - [info]   192.168.1.72(192.168.1.72:3306)  
Version=5.5.35-enterprise-commercial-advanced-log (oldest major version between 
slaves) log-bin:enabled
Sun Feb 16 18:39:15 2014 - [info]     Replicating from 
192.168.1.71(192.168.1.71:3306)
Sun Feb 16 18:39:15 2014 - [info]   192.168.1.73(192.168.1.73:3306)  
Version=5.5.35-enterprise-commercial-advanced-log (oldest major version between 
slaves) log-bin:enabled
Sun Feb 16 18:39:15 2014 - [info]     Replicating from 
192.168.1.71(192.168.1.71:3306)
Sun Feb 16 18:39:15 2014 - [info] Current Alive Master: 
192.168.1.71(192.168.1.71:3306)
Sun Feb 16 18:39:15 2014 - [info] Checking slave configurations..
Sun Feb 16 18:39:15 2014 - [info]  read_only=1 is not set on slave 
192.168.1.72(192.168.1.72:3306).
Sun Feb 16 18:39:15 2014 - [warning]  relay_log_purge=0 is not set on slave 
192.168.1.72(192.168.1.72:3306).
Sun Feb 16 18:39:15 2014 - [info]  read_only=1 is not set on slave 
192.168.1.73(192.168.1.73:3306).
Sun Feb 16 18:39:15 2014 - [warning]  relay_log_purge=0 is not set on slave 
192.168.1.73(192.168.1.73:3306).
Sun Feb 16 18:39:15 2014 - [info] Checking replication filtering settings..
Sun Feb 16 18:39:15 2014 - [info]  binlog_do_db= , binlog_ignore_db=
Sun Feb 16 18:39:15 2014 - [info]  Replication filtering check ok.
"/var/log/masterha/ksh/ksh.log" 862L, 70660C
Sun Feb 16 19:11:35 2014 - [info] Waiting until all relay logs are applied.
Sun Feb 16 19:11:35 2014 - [info]  done.
Sun Feb 16 19:11:35 2014 - [info] Getting slave status..
Sun Feb 16 19:11:35 2014 - [info] This slave(192.168.1.73)'s 
Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000003:565). No 
need to recover from Exec_Master_Log_Pos.
Sun Feb 16 19:11:35 2014 - [info] Connecting to the target slave host 
192.168.1.73, running recover script..
Sun Feb 16 19:11:35 2014 - [info] Executing command: apply_diff_relay_logs 
--command=apply --slave_user='root' --slave_host=192.168.1.73 
--slave_ip=192.168.1.73  --slave_port=3306 
--apply_files=/var/log/masterha/ksh/saved_master_binlog_from_192.168.1.72_3306_2
0140216191105.binlog --workdir=/var/log/masterha/ksh 
--target_version=5.5.35-enterprise-commercial-advanced-log 
--timestamp=20140216191105 --handle_raw_binlog=1 --disable_log_bin=0 
--manager_version=0.55 --slave_pass=xxx
Sun Feb 16 19:11:36 2014 - [info]
Applying differential binary/relay log files 
/var/log/masterha/ksh/saved_master_binlog_from_192.168.1.72_3306_20140216191105.
binlog on 192.168.1.73:3306. This may take long time...
Applying log files succeeded.
Sun Feb 16 19:11:36 2014 - [info]  All relay logs were successfully applied.
Sun Feb 16 19:11:36 2014 - [info]  Resetting slave 
192.168.1.73(192.168.1.73:3306) and starting replication from the new master 
192.168.1.71(192.168.1.71:3306)..
Sun Feb 16 19:11:36 2014 - [info]  Executed CHANGE MASTER.
Sun Feb 16 19:11:36 2014 - [info]  Slave started.
Sun Feb 16 19:11:36 2014 - [info] End of log messages from 192.168.1.73.
Sun Feb 16 19:11:36 2014 - [info] -- Slave recovery on host 
192.168.1.73(192.168.1.73:3306) succeeded.
Sun Feb 16 19:11:36 2014 - [info] All new slave servers recovered successfully.
Sun Feb 16 19:11:36 2014 - [info]
Sun Feb 16 19:11:36 2014 - [info] * Phase 5: New master cleanup phase..
Sun Feb 16 19:11:36 2014 - [info]
Sun Feb 16 19:11:36 2014 - [info] Resetting slave info on the new master..
Sun Feb 16 19:11:37 2014 - [info]  192.168.1.71: Resetting slave info succeeded.
Sun Feb 16 19:11:37 2014 - [info] Master failover to 
192.168.1.71(192.168.1.71:3306) completed successfully.
Sun Feb 16 19:11:37 2014 - [info]

----- Failover Report -----

ksh: MySQL Master failover 192.168.1.72 to 192.168.1.71 succeeded

Master 192.168.1.72 is down!

Check MHA Manager logs at NEWm:/var/log/masterha/ksh/ksh.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.1.72.
The latest slave 192.168.1.71(192.168.1.71:3306) has all relay logs for 
recovery.
Selected 192.168.1.71 as a new master.
192.168.1.71: OK: Applying all logs succeeded.
Failed to activate master IP address for 192.168.1.71 with return code 10:0
192.168.1.73: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.1.73: OK: Applying all logs succeeded. Slave started, replicating from 
192.168.1.71.
192.168.1.71: Resetting slave info succeeded.
Master failover to 192.168.1.71(192.168.1.71:3306) completed successfully.
Sun Feb 16 19:17:41 2014 - [info] MHA::MasterMonitor version 0.55.
Sun Feb 16 19:21:43 2014 - [info] Starting master failover.
Sun Feb 16 19:21:43 2014 - [info]
Sun Feb 16 19:21:43 2014 - [info] * Phase 1: Configuration Check Phase..
Sun Feb 16 19:21:43 2014 - [info]
Sun Feb 16 19:21:43 2014 - [info] Dead Servers:
Sun Feb 16 19:21:43 2014 - [info]   192.168.1.71(192.168.1.71:3306)
Sun Feb 16 19:21:43 2014 - [info] Checking master reachability via mysql(double 
check)..
Sun Feb 16 19:21:44 2014 - [info]  ok.
Sun Feb 16 19:21:44 2014 - [info] Alive Servers:
Sun Feb 16 19:21:44 2014 - [info]   192.168.1.72(192.168.1.72:3306)
Sun Feb 16 19:21:44 2014 - [info]   192.168.1.73(192.168.1.73:3306)
Sun Feb 16 19:21:44 2014 - [info] Alive Slaves:
Sun Feb 16 19:21:44 2014 - [info]   192.168.1.72(192.168.1.72:3306)  
Version=5.5.35-enterprise-commercial-advanced-log (oldest major version between 
slaves) log-bin:enabled
Sun Feb 16 19:21:44 2014 - [info]     Replicating from 
192.168.1.71(192.168.1.71:3306)
Sun Feb 16 19:21:44 2014 - [info]   192.168.1.73(192.168.1.73:3306)  
Version=5.5.35-enterprise-commercial-advanced-log (oldest major version between 
slaves) log-bin:enabled
Sun Feb 16 19:21:44 2014 - [info]     Replicating from 
192.168.1.71(192.168.1.71:3306)
Sun Feb 16 19:21:44 2014 - [info] ** Phase 1: Configuration Check Phase 
completed.
Sun Feb 16 19:21:44 2014 - [info]
Sun Feb 16 19:21:44 2014 - [info] * Phase 2: Dead Master Shutdown Phase..
Sun Feb 16 19:21:44 2014 - [info]
Sun Feb 16 19:21:44 2014 - [info] Forcing shutdown so that applications never 
connect to the current master..
Sun Feb 16 19:21:44 2014 - [info] Executing master IP deactivatation script:
Sun Feb 16 19:21:44 2014 - [info]   /usr/local/bin/master_ip_failover 
--orig_master_host=192.168.1.71 --orig_master_ip=192.168.1.71 
--orig_master_port=3306 --command=stop
Got Error: Use of uninitialized value $ssh_user in concatenation (.) or string 
at /usr/local/bin/master_ip_failover line 136.

IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 
10.10.10.11/24===

Disabling the VIP on old master: 192.168.1.71
Sun Feb 16 19:21:44 2014 - 
[error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln340] Failed to 
deactivate master IP with return code 1:0
Sun Feb 16 19:21:44 2014 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, 
ln178] Got ERROR:  at /usr/local/bin/masterha_manager line 65
Sun Feb 16 19:21:44 2014 - [info]

----- Failover Report -----

ksh: MySQL Master failover 192.168.1.71

Master 192.168.1.71 is down!

Check MHA Manager logs at NEWm:/var/log/masterha/ksh/ksh.log for details.

Started automated(non-interactive) failover.
Failed to deactivate master IP with return code 1:0
Got Error so couldn't continue failover from here.

------------------------------------------------------
Plz help me..

Original comment by au...@naver.com on 17 Feb 2014 at 9:02