zekky / mysql-master-ha

Automatically exported from code.google.com/p/mysql-master-ha
0 stars 0 forks source link

After auto switch ,(Online) Master Switch FAILED,show "mysql is not alive" #69

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
OS VERSION:Centos 6.3-64.bit
Mysql version: 5.6.14
Manger and node version: 0.55 and 0.54

What steps will reproduce the problem?
1.killall -9 mysqld mysqld_safe on node mysql1
2.autoswitch master to mysql2 successed
3.maunal onlie maseter switch failed

What is the expected output? What do you see instead?
----- Failover Report -----

app1: MySQL Master failover mysql1 to mysql2 succeeded

Master mysql1 is down!

Check MHA Manager logs at mysql3:/etc/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave mysql2(192.168.230.104:3306) has all relay logs for recovery.
Selected mysql2 as a new master.
mysql2: OK: Applying all logs succeeded.
mysql4: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
mysql4: OK: Applying all logs succeeded. Slave started, replicating from mysql2.
mysql2: Resetting slave info succeeded.
Master failover to mysql2(192.168.230.104:3306) completed successfully.

[root@mysql3 app1]# masterha_master_switch --conf=/etc/masterha/app1/app1.cnf 
--master_state=alive --new_master_host=mysq1
Sat Oct 12 16:10:39 2013 - [info] MHA::MasterRotate version 0.55.
Sat Oct 12 16:10:39 2013 - [info] Starting online master switch..
Sat Oct 12 16:10:39 2013 - [info] 
Sat Oct 12 16:10:39 2013 - [info] * Phase 1: Configuration Check Phase..
Sat Oct 12 16:10:39 2013 - [info] 
Sat Oct 12 16:10:39 2013 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Sat Oct 12 16:10:39 2013 - [info] Reading application default configurations 
from /etc/masterha/app1/app1.cnf..
Sat Oct 12 16:10:39 2013 - [info] Reading server configurations from 
/etc/masterha/app1/app1.cnf..
Sat Oct 12 16:10:39 2013 - [info] Current Alive Master: 
mysql2(192.168.230.104:3306)
Sat Oct 12 16:10:39 2013 - [info] Alive Slaves:
Sat Oct 12 16:10:39 2013 - [info]   mysql1(192.168.230.103:3306)  
Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Sat Oct 12 16:10:39 2013 - [info]     Replicating from 
mysql2(192.168.230.104:3306)
Sat Oct 12 16:10:39 2013 - [info]     Primary candidate for the new Master 
(candidate_master is set)
Sat Oct 12 16:10:39 2013 - [info]   mysql4(192.168.230.106:3306)  
Version=5.6.14 (oldest major version between slaves) log-bin:disabled
Sat Oct 12 16:10:39 2013 - [info]     Replicating from 
mysql2(192.168.230.104:3306)
Sat Oct 12 16:10:39 2013 - [info]     Not candidate for the new Master 
(no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before 
switching. Is it ok to execute on mysql2(192.168.230.104:3306)? (YES/no): YES
Sat Oct 12 16:10:50 2013 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. 
This may take long time..
Sat Oct 12 16:10:50 2013 - [info]  ok.
Sat Oct 12 16:10:50 2013 - [info] Checking MHA is not monitoring or doing 
failover..
Sat Oct 12 16:10:50 2013 - [info] Checking replication health on mysql1..
Sat Oct 12 16:10:50 2013 - [info]  ok.
Sat Oct 12 16:10:50 2013 - [info] Checking replication health on mysql4..
Sat Oct 12 16:10:50 2013 - [info]  ok.
Sat Oct 12 16:10:50 2013 - 
[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln1145] mysq1 is not 
alive!
Sat Oct 12 16:10:50 2013 - 
[error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln232] Failed to get 
new master!
Sat Oct 12 16:10:50 2013 - 
[error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln178] Got ERROR:  at 
/usr/bin/masterha_master_switch line 53

Original issue reported on code.google.com by flutters...@sina.com on 12 Oct 2013 at 8:26

GoogleCodeExporter commented 9 years ago
192.168.230.103 mysql1  --master
192.168.230.104 mysql2  --master 
192.168.230.105 mysql3  --monitor
192.168.230.106 mysql4  --salve

[root@mysql3 app1]# vi app1.cnf 
[server default]
manager_workdir=/etc/masterha/app1
manager_log=/etc/masterha/app1/manager.log
user=mha_mon
password=123456
ssh_user=root
repl_user=repl
repl_password=slave
ping_interval=1
remote_workdir=/etc/masterha/app1
master_binlog_dir=/mydata/data/binlog/
shutdown_script=""
master_ip_online_change_script=""
report_script=""
#master_ip_failover_script="/usr/local/bin/master_ip_failover"
[server1]
hostname=mysql1
candidate_master=1
[server2]
hostname=mysql2
candidate_master=1
#[server3]
#hostname=mysql3
[server4]
hostname=mysql4
no_master=1

Original comment by flutters...@sina.com on 12 Oct 2013 at 8:28

GoogleCodeExporter commented 9 years ago
BETWEEN STEP 2 AND 3,I START MSYQL SERVER ON MYSQL1.
QUERY THE STATUS FROM MYSQL1.
mysql> show master status;
+------------------+----------+--------------+------------------+---------------
----+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | 
Executed_Gtid_Set |
+------------------+----------+--------------+------------------+---------------
----+
| mysql-bin.000015 |      120 | shawn        |                  |               
    |
+------------------+----------+--------------+------------------+---------------
----+
1 row in set (0.00 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: mysql2
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000013
          Read_Master_Log_Pos: 5706
               Relay_Log_File: slave-relay-bin.000020
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000013
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: shawn,shawn
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 5706
              Relay_Log_Space: 456
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
                  Master_UUID: d939fa7f-28e2-11e3-8d78-005056a5ffdd
             Master_Info_File: /mydata/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

Original comment by flutters...@sina.com on 12 Oct 2013 at 8:30

GoogleCodeExporter commented 9 years ago
I test online master switch,it also get same error,why it shows "mysql1 is not 
alive"?
================================================================================
====
[root@mysql3 app1]#  masterha_master_switch --conf=/etc/masterha/app1/app1.cnf 
--master_state=alive --new_master_host=mysq1
Tue Oct 29 22:49:08 2013 - [info] MHA::MasterRotate version 0.55.
Tue Oct 29 22:49:08 2013 - [info] Starting online master switch..
Tue Oct 29 22:49:08 2013 - [info] 
Tue Oct 29 22:49:08 2013 - [info] * Phase 1: Configuration Check Phase..
Tue Oct 29 22:49:08 2013 - [info] 
Tue Oct 29 22:49:08 2013 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Tue Oct 29 22:49:08 2013 - [info] Reading application default configurations 
from /etc/masterha/app1/app1.cnf..
Tue Oct 29 22:49:08 2013 - [info] Reading server configurations from 
/etc/masterha/app1/app1.cnf..
Tue Oct 29 22:49:09 2013 - [info] Multi-master configuration is detected. 
Current primary(writable) master is mysql2(192.168.230.104:3306)
Tue Oct 29 22:49:09 2013 - [info] Master configurations are as below: 
Master mysql1(192.168.230.103:3306), replicating from 
mysql2(192.168.230.104:3306), read-only
Master mysql2(192.168.230.104:3306), replicating from 
mysql1(192.168.230.103:3306)

Tue Oct 29 22:49:09 2013 - [info] Current Alive Master: 
mysql2(192.168.230.104:3306)
Tue Oct 29 22:49:09 2013 - [info] Alive Slaves:
Tue Oct 29 22:49:09 2013 - [info]   mysql1(192.168.230.103:3306)  
Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Tue Oct 29 22:49:09 2013 - [info]     Replicating from 
mysql2(192.168.230.104:3306)
Tue Oct 29 22:49:09 2013 - [info]     Primary candidate for the new Master 
(candidate_master is set)
Tue Oct 29 22:49:09 2013 - [info]   mysql4(192.168.230.106:3306)  
Version=5.6.14 (oldest major version between slaves) log-bin:disabled
Tue Oct 29 22:49:09 2013 - [info]     Replicating from 
mysql2(192.168.230.104:3306)
Tue Oct 29 22:49:09 2013 - [info]     Not candidate for the new Master 
(no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before 
switching. Is it ok to execute on mysql2(192.168.230.104:3306)? (YES/no): YES
Tue Oct 29 22:49:12 2013 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. 
This may take long time..
Tue Oct 29 22:49:12 2013 - [info]  ok.
Tue Oct 29 22:49:12 2013 - [info] Checking MHA is not monitoring or doing 
failover..
Tue Oct 29 22:49:12 2013 - [info] Checking replication health on mysql1..
Tue Oct 29 22:49:12 2013 - [info]  ok.
Tue Oct 29 22:49:12 2013 - [info] Checking replication health on mysql4..
Tue Oct 29 22:49:12 2013 - [info]  ok.
Tue Oct 29 22:49:12 2013 - 
[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln1145] mysq1 is not 
alive!
Tue Oct 29 22:49:12 2013 - 
[error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln232] Failed to get 
new master!
Tue Oct 29 22:49:12 2013 - 
[error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln178] Got ERROR:  at 
/usr/bin/masterha_master_switch line 53

Original comment by flutters...@sina.com on 29 Oct 2013 at 2:53