matthewbogner / mysql-master-ha

Automatically exported from code.google.com/p/mysql-master-ha
1 stars 0 forks source link

Error while Testing master failover #7

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
I have set up 2 different version of mysql on same machine and made one the 
master of the other.
and followed all the steps given in 

http://code.google.com/p/mysql-master-ha/wiki/Tutorial#Installing_MHA_Manager_on
_host4%28manager_host%29

But for testing master failover if I kill the master I get the following output

Tue Nov  8 21:24:28 2011 - [info] Ping succeeded, sleeping until it doesn't 
respond..
Tue Nov  8 21:24:49 2011 - [warning] Got error on MySQL ping: 2006 (MySQL 
server has gone away)
Tue Nov  8 21:24:49 2011 - [info] Executing seconary network check script: 
masterha_secondary_check -s remote_host1 -s remote_host2  --user=root  
--master_host=127.0.0.1  --master_ip=127.0.0.1  --master_port=3308
ssh: Could not resolve hostname remote_host1: Name or service not known
Monitoring server remote_host1 is NOT reachable!
Tue Nov  8 21:24:49 2011 - [warning] At least one of monitoring servers is not 
reachable from this script. This is likely network problem. Failover should not 
happen.
Tue Nov  8 21:24:49 2011 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Nov  8 21:24:52 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:24:52 2011 - [warning] Connection failed 1 time(s)..
Tue Nov  8 21:24:55 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:24:55 2011 - [warning] Connection failed 2 time(s)..
Tue Nov  8 21:24:58 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:24:58 2011 - [warning] Connection failed 3 time(s)..
Tue Nov  8 21:24:58 2011 - [warning] Secondary network check script returned 
errors. Failover should not start so checking server status again. Check 
network settings for details.
Tue Nov  8 21:25:01 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:25:01 2011 - [warning] Connection failed 1 time(s)..
Tue Nov  8 21:25:01 2011 - [info] Executing seconary network check script: 
masterha_secondary_check -s remote_host1 -s remote_host2  --user=root  
--master_host=127.0.0.1  --master_ip=127.0.0.1  --master_port=3308
ssh: Could not resolve hostname remote_host1: Name or service not known
Monitoring server remote_host1 is NOT reachable!
Tue Nov  8 21:25:01 2011 - [warning] At least one of monitoring servers is not 
reachable from this script. This is likely network problem. Failover should not 
happen.
Tue Nov  8 21:25:01 2011 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Nov  8 21:25:04 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:25:04 2011 - [warning] Connection failed 2 time(s)..
Tue Nov  8 21:25:07 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:25:07 2011 - [warning] Connection failed 3 time(s)..
Tue Nov  8 21:25:07 2011 - [warning] Secondary network check script returned 
errors. Failover should not start so checking server status again. Check 
network settings for details.

What version of the product are you using? On what operating system?
On liNUX

Original issue reported on code.google.com by ajays20...@gmail.com on 8 Nov 2011 at 4:00

GoogleCodeExporter commented 9 years ago
Read error messages carefully.
"ssh: Could not resolve hostname remote_host1: Name or service not known"

So "remote_host1" should be an appropriate remote hostname on your environment.

Original comment by Yoshinor...@gmail.com on 8 Nov 2011 at 4:14

GoogleCodeExporter commented 9 years ago
Where do I modify the remote_host1 ? As in which file?

Original comment by ajays20...@gmail.com on 8 Nov 2011 at 4:28

GoogleCodeExporter commented 9 years ago
Also once I get the Master up it again says 
Ping succeeded, sleeping until it doesn't respond..

Original comment by ajays20...@gmail.com on 8 Nov 2011 at 4:34

GoogleCodeExporter commented 9 years ago
ok changed the remote_host1 to localhost ,but getting the following errors!

Master was running on port 3308(mysql 4.1) and slave was running on port 
3306(mysql 5.1.1)

app1: MySQL Master failover 127.0.0.1

Master 127.0.0.1 is down!

Check MHA Manager logs at adaministrator-laptop for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 127.0.0.1.
Power off 127.0.0.1.
The latest slave 127.0.0.1(127.0.0.1:3306) has all relay logs for recovery.
Selected 127.0.0.1 as a new master.
127.0.0.1: OK: Applying all logs succeeded.
Failed to activate master IP address for 127.0.0.1 with return code 2:0
Got Error so couldn't continue failover from here.
Wed Nov  9 10:53:00 2011 - [info] Sending mail..
sh: /script/masterha/send_master_failover_mail: not found
Wed Nov  9 10:53:00 2011 - 
[error][/usr/local/share/perl/5.10.1/MHA/MasterFailover.pm, ln1516] Failed to 
send mail with return code 127:0
[1]+  Killed                  sudo masterha_manager --conf=/etc/app1.cnf

Original comment by ajays20...@gmail.com on 9 Nov 2011 at 5:31

GoogleCodeExporter commented 9 years ago
Have mailed you the entire log file details.

Original comment by ajays20...@gmail.com on 9 Nov 2011 at 6:03

GoogleCodeExporter commented 9 years ago
Read logs carefully..

"sh: /script/masterha/send_master_failover_mail: not found"

Please set this script, or remove "report_script" parameter from configuration 
file.

Original comment by Yoshinor...@gmail.com on 9 Nov 2011 at 9:49

GoogleCodeExporter commented 9 years ago
That is not the primary cause of the error,even after commenting that, only the 
last two lines of error log wont appear ,rest of it continues to exist. Have 
mailed you the error log I got after removing report_script.

Original comment by ajays20...@gmail.com on 9 Nov 2011 at 10:07

GoogleCodeExporter commented 9 years ago

Original comment by Yoshinor...@gmail.com on 17 Nov 2011 at 7:14