tinv33043 / mysql-master-ha

Automatically exported from code.google.com/p/mysql-master-ha
0 stars 0 forks source link

mysql-master-ha fails to disable slave on a new master #34

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi.

Testing mysql-master-ha (with 3 slaves and one master), I discovered that the 
new master will still be seen as a slave and masterha_manager then refuses to 
start.
It also won't remove the failed master from the config when I run:
# masterha_manager --remove_dead_master_conf --conf=/etc/mha/app1.cnf

This is part of the log telling that mysql-master-ha failed to remove the slave 
part from the new master and that it still runs as slave:

Tue Sep 25 14:25:45 2012 - [info] * Phase 5: New master cleanup phease..
Tue Sep 25 14:25:45 2012 - [info]
Tue Sep 25 14:25:45 2012 - [info] Resetting slave info on the new master..
Tue Sep 25 14:25:45 2012 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, 
ln674]  SHOW SLAVE STATUS shows new master replicates from somewhere. Check for 
details!
Tue Sep 25 14:25:45 2012 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, 
ln688]  db02.db.cert.fronter.net: Resetting slave info failed.
Tue Sep 25 14:25:45 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln1537] Master 
failover to db02.mynetwork.net(11.22.33.2:3306) done, but recovery on slave 
partially failed.
Tue Sep 25 14:25:45 2012 - [info]

This is output of show slave status:

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: db01.mynetwork.net
                  Master_User: replica
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql-bin.000049
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysqld-relay-bin.000004
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000049
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 839
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:   
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2003
                Last_IO_Error: error reconnecting to master 'replica@db01.mynetwork.net:3306' - retry-time: 10  retries: 86400
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)

And finally this is the error I get running
# masterha_manager --remove_dead_master_conf --conf=/etc/mha/app1.cnf

Tue Sep 25 15:28:10 2012 - [warning] SQL Thread is stopped(no error) on 
db02.mynetwork.net(11.22.33.2:3306)
Tue Sep 25 15:28:10 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln732] Multi-master 
configuration is detected, but two or more masters are either writable 
(read-only is not set) or dead! Check configurations for details. Master 
configurations are as below: 
Master db01.mynetwork.net(11.22.33.1:3306), dead
Master db02.db.cert.fronter.net(11.22.33.2:3306), replicating from 
db01.mynetwork.net(11.22.33.1:3306)

Tue Sep 25 15:28:10 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln383] Error happend 
on checking configurations.  at 
/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 298
Tue Sep 25 15:28:10 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln478] Error 
happened on monitoring servers.
Tue Sep 25 15:28:10 2012 - [info] Got exit code 1 (Not master dead).

Is it a known issue? Any idea why this fails? 

Original issue reported on code.google.com by m.je...@gmail.com on 25 Sep 2012 at 1:31