Error with ---ignore_fail_on_start=1

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Get lastest(on 17.12.2012) code by git from  
https://github.com/yoshinorim/mha4mysql-manager.git

2. make && install

perl Makefile.PL  PREFIX=/usr
make
make install

3.my mha conf file
##
# init users and dirs
##
# list of servers
[server1]
hostname=us1
ip=172.16.50.11
candidate_master=1
ignore_fail=1

[server2]
hostname=us4
ip=172.16.50.14
candidate_master=1
ignore_fail=1

[server3]
ignore_fail=1
hostname=us3
ip=172.16.50.13
candidate_master=1

4. 

Check that mysql is stopped on server 172.16.50.13, and run on other

Run mha-manager
masterha_manager --ignore_fail_on_start=1  --conf=/home/mha4mysql/etc/app1.cnf

What is the expected output?
Mha should start and notice that US3 is dead and then continue work

What do you see instead?
Mha gone out with error

############
############
Mon Dec 17 10:58:08 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Mon Dec 17 10:58:08 2012 - [info] Reading application default configurations 
from /home/mha4mysql/etc/app1.cnf..
Mon Dec 17 10:58:08 2012 - [info] Reading server configurations from 
/home/mha4mysql/etc/app1.cnf..
Mon Dec 17 10:58:08 2012 - [info] MHA::MasterMonitor version 0.55.
Mon Dec 17 10:58:08 2012 - [info] Dead Servers:
Mon Dec 17 10:58:08 2012 - [info]   us3(172.16.50.13:3306)
Mon Dec 17 10:58:08 2012 - [info] Alive Servers:
Mon Dec 17 10:58:08 2012 - [info]   funky(172.16.50.11:3306)
Mon Dec 17 10:58:08 2012 - [info]   us4(172.16.50.14:3306)
Mon Dec 17 10:58:08 2012 - [info] Alive Slaves:
Mon Dec 17 10:58:08 2012 - [info]   funky(172.16.50.11:3306)  
Version=5.5.28-MariaDB-log (oldest major version between slaves) log-bin:enabled
Mon Dec 17 10:58:08 2012 - [info]     Replicating from 
172.16.50.14(172.16.50.14:3306)
Mon Dec 17 10:58:08 2012 - [info]     Primary candidate for the new Master 
(candidate_master is set)
Mon Dec 17 10:58:08 2012 - [info] Current Alive Master: us4(172.16.50.14:3306)
Mon Dec 17 10:58:08 2012 - [info] Checking slave configurations..
Mon Dec 17 10:58:08 2012 - [info]  read_only=1 is not set on slave 
funky(172.16.50.11:3306).
Mon Dec 17 10:58:08 2012 - [warning]  relay_log_purge=0 is not set on slave 
funky(172.16.50.11:3306).
Mon Dec 17 10:58:08 2012 - [info] Checking replication filtering settings..
Mon Dec 17 10:58:08 2012 - [info]  binlog_do_db= testdb, binlog_ignore_db=
Mon Dec 17 10:58:08 2012 - [info]  Replication filtering check ok.
Mon Dec 17 10:58:08 2012 - [info] Starting SSH connection tests..
Mon Dec 17 10:58:09 2012 - [info] All SSH connection tests passed successfully.
Mon Dec 17 10:58:09 2012 - [info] Checking MHA Node version..
Mon Dec 17 10:58:09 2012 - [info]  Version check ok.
Mon Dec 17 10:58:09 2012 - [error][/usr/share/perl/5.14/MHA/ServerManager.pm, 
ln443]  Server us3(172.16.50.13:3306) is dead, but must be alive! Check server 
settings.
Mon Dec 17 10:58:09 2012 - [error][/usr/share/perl/5.14/MHA/MasterMonitor.pm, 
ln386] Error happend on checking configurations.  at 
/usr/share/perl/5.14/MHA/MasterMonitor.pm line 363
Mon Dec 17 10:58:09 2012 - [error][/usr/share/perl/5.14/MHA/MasterMonitor.pm, 
ln482] Error happened on monitoring servers.
Mon Dec 17 10:58:09 2012 - [info] Got exit code 1 (Not master dead).
###########
###########

Please provide any additional information below.

I am not ace in perl, but i try to debug error.
I add printf in MasterMonitor.pm after section "GetOptions(" in "sub main "
it prints evrytime 0 and not depend what i write in arg when execute
masterha_manager --ignore_fail_on_start=1  --conf=/home/mha4mysql/etc/app1.cnf
or 
masterha_manager --ignore_fail_on_start=0  --conf=/home/mha4mysql/etc/app1.cnf

I found what need to change

diff --git a/lib/MHA/MasterMonitor.pm b/lib/MHA/MasterMonitor.pm
index 71945de..ff80c89 100644
--- a/lib/MHA/MasterMonitor.pm
+++ b/lib/MHA/MasterMonitor.pm
@@ -636,7 +636,7 @@ sub main {
     'manager_log=s'           => \$g_logfile,
     'skip_ssh_check'          => \$g_skip_ssh_check,          # for testing
     'skip_check_ssh'          => \$g_skip_ssh_check,
-    'ignore_fail_on_start'    => \$g_ignore_fail_on_start,
+    'ignore_fail_on_start=i'    => \$g_ignore_fail_on_start,
   );
   setpgrp( 0, $$ ) unless ($g_interactive);

After that mha-manager works with argument correctly and as expected

Check it please

Original issue reported on code.google.com by obric...@balakam.com on 17 Dec 2012 at 11:05

GoogleCodeExporter commented 9 years ago

Use --ignore_fail_on_start , then it should work.
Document was incorrect, so I fixed the doc.

Original comment by Yoshinor...@gmail.com on 17 Dec 2012 at 5:54

GoogleCodeExporter commented 9 years ago

thanks!

Original comment by obric...@balakam.com on 18 Dec 2012 at 6:51

GoogleCodeExporter commented 9 years ago

the same situation and same conf
us1, us4 - alive. one of them is master
us3 - dead(slave)

i try change master manually

masterha_master_switch --master_state=alive --orig_master_is_new_slave 
--conf=/home/mha4mysql/etc/app1.cnf --new_master_host=us1

Tue Dec 18 14:41:07 2012 - [info] MHA::MasterRotate version 0.55.
Tue Dec 18 14:41:07 2012 - [info] Starting online master switch..
Tue Dec 18 14:41:07 2012 - [info]
Tue Dec 18 14:41:07 2012 - [info] * Phase 1: Configuration Check Phase..
Tue Dec 18 14:41:07 2012 - [info]
Tue Dec 18 14:41:07 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Tue Dec 18 14:41:07 2012 - [info] Reading application default configurations 
from /home/mha4mysql/etc/app1.cnf..
Tue Dec 18 14:41:07 2012 - [info] Reading server configurations from 
/home/mha4mysql/etc/app1.cnf..
Tue Dec 18 14:41:07 2012 - [error][/usr/share/perl/5.14/MHA/MasterRotate.pm, 
ln93] Switching master should not be started if one or more servers is down.
Tue Dec 18 14:41:07 2012 - [info] Dead Servers:
Tue Dec 18 14:41:07 2012 - [info]   us3(172.16.50.13:3306)
Tue Dec 18 14:41:07 2012 - [error][/usr/share/perl/5.14/MHA/ManagerUtil.pm, 
ln178] Got ERROR:  at /usr/bin/masterha_master_switch line 53

could you add same option ignore_fail_on_start to masterha_master_switch? it'd 
be usefull i think. 
Sometimes it's needed to change master by some reason even if only 1 slave is 
dead.
thanks in advance!

Original comment by obric...@balakam.com on 18 Dec 2012 at 3:02

GoogleCodeExporter commented 9 years ago

--master_state=alive is intended to be invoked manually (for scheduled master 
switch) so if there is any dead slave I think you can easily remove from 
configuration file (by editing manually or running "masterha_conf_host 
--command=delete" command). As there are many workarounds, I'm not going to 
support the feature. (I'm open to accept patches:).

Original comment by Yoshinor...@gmail.com on 19 Dec 2012 at 3:54

GoogleCodeExporter commented 9 years ago

This seemed to allow me to ignore broken or lagging replication with the 
ignore_fail flag; I simply added this to 
/usr/lib/perl5/vendor_perl/MHA/ServerManager.pm

in sub check_replication_health {
...
    if ( $target->has_replication_problem($allow_delay_seconds) ) {
      $log->error(" failed!");
     ## FAF
      if( !$target->{ignore_fail} ) {
       croak;
      }

Original comment by freda...@gmail.com on 27 Nov 2013 at 10:30

stefalee / mysql-master-ha

Error with ---ignore_fail_on_start=1 #46