stefalee / mysql-master-ha

Automatically exported from code.google.com/p/mysql-master-ha
0 stars 0 forks source link

masterha_manager will quit out after the master server fail #12

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
1.I had setting up the masterha_manager and masterha_node,but when i am kill 
the master mysql's porcess, the  masterha_manager will quit out.
and  the  failure of the switch can not be achieved.

2.The below is the Architecture  when i am testing.

         master                      candidate_master                           
      10.1.200.216 --------> 10.1.200.215                10.1.200.27   
      masterha_node             masterha_node           masterha_manager & masterha_node
              \  
           \
                \   
        slave 
    10.1.200.217

--------------------------------------------
The Purpose:
 after killall -9 mysqld at 10.1.200.216, it must be the below,

            master                           
         10.1.200.215             10.1.200.27   
          masterha_node         masterha_manager
              \                 
               \
                \   
              slave 
           10.1.200.217
               masterha_node

BUT:
  after killall -9 mysqld at 10.1.200.216, masterha_manager will quit out, and nothing change.

some more info:

1.install the mysql package both at 10.1.200.215, 10.1.200.216,10.1.200.217, 
10.1.200.27
     rpm -ivh    MySQL-server-5.5.16-1.linux2.6.x86_64.rpm 
     rpm -ivh    MySQL-devel-5.5.16-1.linux2.6.x86_64.rpm
     rpm -ivh    MySQL-client-5.5.16-1.linux2.6.x86_64.rpm

2.install the mha4mysql-node-0.52 to all mysql nodes and 10.1.200.27
   cd mha4mysql-node-0.52;
   perl Makefile.PL&&make install 
   (cut  some of the steps that are not related)

3.install  masterha_manger on   10.1.200.27
    cd mha4mysql-manager-0.52
    perl Makefile.PL
    (cut  some of the steps that are not related)
    make install 

4. the configuration on  10.1.200.27
cat /etc/app1.cnf 
[server default]
  user=root
  password=
  manager_workdir=/var/log/masterha/app1
  manager_log=/var/log/masterha/app1/app1.log
  remote_workdir=/var/log/masterha/app1

  [server1]
  hostname=10.1.200.215
  candidate_master=1
  master_binlog_dir=/var/lib/mysql

  [server2]
  hostname=10.1.200.216
  master_binlog_dir=/var/lib/mysql

 [server3]
  hostname=10.1.200.217
  master_binlog_dir=/var/lib/mysql

  cat /etc/masterha_default.cnf 
  [server default]
  user=root
  password=
  ssh_user=root
  repl_user=slave
  repl_password= mysqlsalve
  master_binlog_dir= /var/lib/mysql
  remote_workdir=/data/log/masterha
  manager_log=/data/log/masterha/manager.log
  secondary_check_script= masterha_secondary_check -s 10.1.200.217 -s 10.1.200.215  --user=root --master_host=10.1.200.216
  ping_interval=3
  master_ip_failover_script= /usr/local/bin/master_ip_failover
  master_ip_online_change_script=/usr/local/bin/master_ip_online_change
  report_script=/usr/local/bin/send_report

1.the output of masterha_check_ssh( )

masterha_check_ssh --conf=/etc/app1.cnf
Tue Dec 27 22:14:06 2011 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Tue Dec 27 22:14:06 2011 - [info] Reading application default configurations 
from /etc/app1.cnf..
Tue Dec 27 22:14:06 2011 - [info] Reading server configurations from 
/etc/app1.cnf..
Tue Dec 27 22:14:06 2011 - [info] Starting SSH connection tests..
Tue Dec 27 22:14:07 2011 - [debug] 
Tue Dec 27 22:14:06 2011 - [debug]  Connecting via SSH from 
root@10.1.200.215(10.1.200.215) to root@10.1.200.216(10.1.200.216)..
Tue Dec 27 22:14:06 2011 - [debug]   ok.
Tue Dec 27 22:14:06 2011 - [debug]  Connecting via SSH from 
root@10.1.200.215(10.1.200.215) to root@10.1.200.217(10.1.200.217)..
Tue Dec 27 22:14:07 2011 - [debug]   ok.
Tue Dec 27 22:14:07 2011 - [debug] 
Tue Dec 27 22:14:06 2011 - [debug]  Connecting via SSH from 
root@10.1.200.216(10.1.200.216) to root@10.1.200.215(10.1.200.215)..
Tue Dec 27 22:14:07 2011 - [debug]   ok.
Tue Dec 27 22:14:07 2011 - [debug]  Connecting via SSH from 
root@10.1.200.216(10.1.200.216) to root@10.1.200.217(10.1.200.217)..
Tue Dec 27 22:14:07 2011 - [debug]   ok.
Tue Dec 27 22:14:08 2011 - [debug] 
Tue Dec 27 22:14:07 2011 - [debug]  Connecting via SSH from 
root@10.1.200.217(10.1.200.217) to root@10.1.200.215(10.1.200.215)..
Tue Dec 27 22:14:07 2011 - [debug]   ok.
Tue Dec 27 22:14:07 2011 - [debug]  Connecting via SSH from 
root@10.1.200.217(10.1.200.217) to root@10.1.200.216(10.1.200.216)..
Tue Dec 27 22:14:08 2011 - [debug]   ok.
Tue Dec 27 22:14:08 2011 - [info] All SSH connection tests passed successfully. 

the output of masterha_check_repl --conf=/etc/app1.cnf

Tue Dec 27 22:16:10 2011 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Tue Dec 27 22:16:10 2011 - [info] Reading application default configurations 
from /etc/app1.cnf..
Tue Dec 27 22:16:10 2011 - [info] Reading server configurations from 
/etc/app1.cnf..
Tue Dec 27 22:16:10 2011 - [info] MHA::MasterMonitor version 0.52.
Tue Dec 27 22:16:10 2011 - [info] Dead Servers:
Tue Dec 27 22:16:10 2011 - [info] Alive Servers:
Tue Dec 27 22:16:10 2011 - [info]   10.1.200.215(10.1.200.215:3306)
Tue Dec 27 22:16:10 2011 - [info]   10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info]   10.1.200.217(10.1.200.217:3306)
Tue Dec 27 22:16:10 2011 - [info] Alive Slaves:
Tue Dec 27 22:16:10 2011 - [info]   10.1.200.215(10.1.200.215:3306)  
Version=5.5.16-log (oldest major version between slaves) log-bin:enabled
Tue Dec 27 22:16:10 2011 - [info]     Replicating from 
10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info]     Primary candidate for the new Master 
(candidate_master is set)
Tue Dec 27 22:16:10 2011 - [info]   10.1.200.217(10.1.200.217:3306)  
Version=5.5.16-log (oldest major version between slaves) log-bin:enabled
Tue Dec 27 22:16:10 2011 - [info]     Replicating from 
10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info] Current Alive Master: 
10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info] Checking slave configurations..
Tue Dec 27 22:16:10 2011 - [warning]  read_only=1 is not set on slave 
10.1.200.215(10.1.200.215:3306).
Tue Dec 27 22:16:10 2011 - [warning]  relay_log_purge=0 is not set on slave 
10.1.200.215(10.1.200.215:3306).
Tue Dec 27 22:16:10 2011 - [warning]  read_only=1 is not set on slave 
10.1.200.217(10.1.200.217:3306).
Tue Dec 27 22:16:10 2011 - [warning]  relay_log_purge=0 is not set on slave 
10.1.200.217(10.1.200.217:3306).
Tue Dec 27 22:16:10 2011 - [info] Checking replication filtering settings..
Tue Dec 27 22:16:10 2011 - [info]  binlog_do_db= , binlog_ignore_db= 
Tue Dec 27 22:16:10 2011 - [info]  Replication filtering check ok.
Tue Dec 27 22:16:10 2011 - [info] Starting SSH connection tests..
Tue Dec 27 22:16:12 2011 - [info] All SSH connection tests passed successfully.
Tue Dec 27 22:16:12 2011 - [info] Checking MHA Node version..
Tue Dec 27 22:16:13 2011 - [info]  Version check ok.
Tue Dec 27 22:16:13 2011 - [info] Checking SSH publickey authentication and 
checking recovery script configurations on the current master..
Tue Dec 27 22:16:13 2011 - [info]   Executing command: save_binary_logs 
--command=test --start_file=mysql-bin.000007 --start_pos=4 
--binlog_dir=/var/lib/mysql 
--output_file=/var/log/masterha/app1/save_binary_logs_test 
--manager_version=0.52 
Tue Dec 27 22:16:13 2011 - [info]   Connecting to 
root@10.1.200.216(10.1.200.216).. 
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000007
Tue Dec 27 22:16:14 2011 - [info] Master setting check done.
Tue Dec 27 22:16:14 2011 - [info] Checking SSH publickey authentication and 
checking recovery script configurations on all alive slave servers..
Tue Dec 27 22:16:14 2011 - [info]   Executing command : apply_diff_relay_logs 
--command=test --slave_user=root --slave_host=10.1.200.215 
--slave_ip=10.1.200.215 --slave_port=3306 --workdir=/var/log/masterha/app1 
--target_version=5.5.16-log --manager_version=0.52 
--relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Tue Dec 27 22:16:14 2011 - [info]   Connecting to 
root@10.1.200.215(10.1.200.215).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to yl-hyper-15-relay-bin.000014
    Temporary relay log file is /var/lib/mysql/yl-hyper-15-relay-bin.000014
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Dec 27 22:16:14 2011 - [info]   Executing command : apply_diff_relay_logs 
--command=test --slave_user=root --slave_host=10.1.200.217 
--slave_ip=10.1.200.217 --slave_port=3306 --workdir=/var/log/masterha/app1 
--target_version=5.5.16-log --manager_version=0.52 
--relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Tue Dec 27 22:16:14 2011 - [info]   Connecting to 
root@10.1.200.217(10.1.200.217).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to yl-hyper-17-relay-bin.000013
    Temporary relay log file is /var/lib/mysql/yl-hyper-17-relay-bin.000013
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Dec 27 22:16:14 2011 - [info] Slaves settings check done.
Tue Dec 27 22:16:14 2011 - [info] 
10.1.200.216 (current master)
 +--10.1.200.215
 +--10.1.200.217

Tue Dec 27 22:16:14 2011 - [info] Checking replication health on 10.1.200.215..
Tue Dec 27 22:16:14 2011 - [info]  ok.
Tue Dec 27 22:16:14 2011 - [info] Checking replication health on 10.1.200.217..
Tue Dec 27 22:16:14 2011 - [info]  ok.
Tue Dec 27 22:16:14 2011 - [info] Checking master_ip_failvoer_script status:
Tue Dec 27 22:16:14 2011 - [info]   /usr/local/bin/master_ip_failover 
--command=status --ssh_user=root --orig_master_host=10.1.200.216 
--orig_master_ip=10.1.200.216 --orig_master_port=3306
Tue Dec 27 22:16:14 2011 - [info]  OK.
Tue Dec 27 22:16:14 2011 - [warning] shutdown_script is not defined.
Tue Dec 27 22:16:14 2011 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

i do the below command to run masterha_manager at 10.1.200.27
  mkdir -p /data/log/masterha;
  nohup masterha_manager --conf=/etc/app1.cnf < /dev/null > /data/log/masterha/manager.log 2>&1 &;

tail -f /data/log/masterha/manager.log
Tue Dec 27 21:39:59 2011 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Tue Dec 27 21:39:59 2011 - [info] Reading application default configurations 
from /etc/app1.cnf..
Tue Dec 27 21:39:59 2011 - [info] Reading server configurations from 
/etc/app1.cnf..
Tue Dec 27 21:40:33 2011 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Tue Dec 27 21:40:33 2011 - [info] Reading application default configurations 
from /etc/app1.cnf..
Tue Dec 27 21:40:33 2011 - [info] Reading server configurations from 
/etc/app1.cnf..

Original issue reported on code.google.com by r...@mkrss.com on 27 Dec 2011 at 2:43

GoogleCodeExporter commented 9 years ago
>  nohup masterha_manager --conf=/etc/app1.cnf < /dev/null > 
/data/log/masterha/manager.log 2>&1 &;
> 
> tail -f /data/log/masterha/manager.log
> Tue Dec 27 21:39:59 2011 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
> Tue Dec 27 21:39:59 2011 - [info] Reading application default configurations 
from /etc/app1.cnf..
> Tue Dec 27 21:39:59 2011 - [info] Reading server configurations from 
/etc/app1.cnf..
> Tue Dec 27 21:40:33 2011 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
> Tue Dec 27 21:40:33 2011 - [info] Reading application default configurations 
from /etc/app1.cnf..
> Tue Dec 27 21:40:33 2011 - [info] Reading server configurations from 
/etc/app1.cnf..

This is strange. When MHA monitors master successfully, masterha_manager should 
print "waiting until MySQL doesn't respond.." message in the logs. What if you 
set "log_lavel=debug" in the configuration file and simply run masterha_manager 
(not using nohup, not running in backgrounds)? 

Original comment by Yoshinor...@gmail.com on 27 Dec 2011 at 2:58

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Sorry, log_level=debug, not log_lavel.

Original comment by Yoshinor...@gmail.com on 28 Dec 2011 at 2:22

GoogleCodeExporter commented 9 years ago
thanks for replay,  i change the below section in my app1.cnf,   it works. 
  [server default]
  user=root
  password=
  manager_workdir=/var/log/masterha/app1
  #manager_log=/var/log/masterha/app1/app1.log
  #remote_workdir=/var/log/masterha/app1
  log_level=debug

But I have a place  not sure , if  the   masterha_manager  complete the  
failover, the process will be aborted?

Original comment by r...@mkrss.com on 28 Dec 2011 at 2:29

GoogleCodeExporter commented 9 years ago
The masterha_manager process ends after it processes failover. Currently I 
recommend using daemontools to run as a daemon.

http://code.google.com/p/mysql-master-ha/wiki/Runnning_Background

Original comment by Yoshinor...@gmail.com on 28 Dec 2011 at 2:31

GoogleCodeExporter commented 9 years ago

Original comment by Yoshinor...@gmail.com on 28 Dec 2011 at 2:32