swrd / check-mysql-all

Automatically exported from code.google.com/p/check-mysql-all
0 stars 0 forks source link

monitor wont restart. #12

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Hi, 

I am using mmm_mysql for quite a while now and I am having my first big issue 
right now ... :

What steps will reproduce the problem?
1. a network problem night (I still don't know what appent)
2. all the node going offline because of flap (I erased the logs but I remember 
reading flap)
3. first call to mmm_control set_online db1 crash with fatal error
4. and now ... no way to launch the monitor again, still got : "#mmm_control 
show
ERROR: Can't connect to monitor daemon!"

the ps aux | grep mmm show : 
# ps aux | grep mmm
root      6008  0.0  0.2  14308 10400 ?        S    12:28   0:00 mmm_mond
root      6009  0.0  0.2  17048 11944 ?        S    12:28   0:00 mmm_mond
root      6225  0.0  0.0   3072   716 pts/0    S+   12:30   0:00 grep mmm

and netstat

root@dave:/var/log/mysql-mmm# netstat -ntpl
Connexions Internet actives (seulement serveurs)
Proto Recv-Q Send-Q Adresse locale          Adresse distante        Etat       
PID/Program name
tcp        0      0 0.0.0.0:2401            0.0.0.0:*               LISTEN      
4623/inetd
tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      
4087/mysqld
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      
4944/apache2
tcp        0      0 192.168.3.10:53         0.0.0.0:*               LISTEN      
3926/named
tcp        0      0 178.32.110.250:53       0.0.0.0:*               LISTEN      
3926/named
tcp        0      0 94.23.12.99:53          0.0.0.0:*               LISTEN      
3926/named
tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      
3926/named
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      
4433/exim4
tcp        0      0 127.0.0.1:953           0.0.0.0:*               LISTEN      
3926/named
tcp6       0      0 :::5001                 :::*                    LISTEN      
5140/java
tcp6       0      0 :::21                   :::*                    LISTEN      
4861/proftpd: (acce
tcp6       0      0 :::53                   :::*                    LISTEN      
3926/named
tcp6       0      0 :::22                   :::*                    LISTEN      
3956/sshd
tcp6       0      0 ::1:953                 :::*                    LISTEN      
3926/named
root@dave:/var/log/mysql-mmm#

this configuration was working fine before, no change in the ini files, and 
nothing in the log file (0k) but created if I erase it and launch the 

/etc/init.d/mmm-mysql start

What version of the product are you using? On what operating system?

no idea of the monitor version as I can't launch : # /usr/sbin/mmm_mond -v
Can't run second copy of mmm_mond at /usr/sbin/mmm_mond line 79

Please provide any additional information below.
The 4 nodes have their mysql working fine ... just the cluster is dead and no 
way to relaunch it ...

Thanks for any help.

Stéphane

Original issue reported on code.google.com by stephane...@gmail.com on 22 Nov 2010 at 11:37

GoogleCodeExporter commented 8 years ago
I now, have those informations in the log file but still the Can't connect to 
monitor daemon message

2010/11/22 15:10:57  INFO STARTING...
2010/11/22 15:10:57  INFO Waiting for network connection...
2010/11/22 15:10:57  INFO Spawning checker 'ping_ip'...
2010/11/22 15:10:57  INFO Shutting down checker 'ping_ip'...
2010/11/22 15:10:57  INFO Network connection is available.
2010/11/22 15:10:57  INFO Performing initial checks...
2010/11/22 15:10:57  INFO Spawning checker 'mysql'...
2010/11/22 15:10:57  INFO Shutting down checker 'mysql'...
2010/11/22 15:10:57  INFO Spawning checker 'ping'...
2010/11/22 15:10:57  INFO Shutting down checker 'ping'...
2010/11/22 15:10:57  INFO Spawning checker 'rep_backlog'...
2010/11/22 15:10:57  INFO Shutting down checker 'rep_backlog'...
2010/11/22 15:10:57  INFO Spawning checker 'rep_threads'...
2010/11/22 15:10:58  INFO Shutting down checker 'rep_threads'...
2010/11/22 15:10:58  WARN No binary found for killing hosts 
(/usr/lib/mysql-mmm//monitor/kill_host).
2010/11/22 15:10:58  WARN auto_increment_offset should be different on both 
masters (db1: 1 , db2: 1)
2010/11/22 15:10:58  WARN db1: auto_increment_increment (1) should be >= 2
2010/11/22 15:10:58  WARN db2: auto_increment_increment (1) should be >= 2

Stéphane

Original comment by stephane...@gmail.com on 22 Nov 2010 at 2:14

GoogleCodeExporter commented 8 years ago
Hi,

After switching debug to 1 and launching the monitor manually, it stopped on a 
call to an agent on one of the cluster mysql. I found an error on this 
particular server : "
2010/11/23 11:53:05 FATAL Child exited with exitcode 98, restarting after 10 
second sleep
2010/11/23 11:53:15 FATAL Listener: Can't create socket!
2010/11/23 11:53:15 FATAL Child exited with exitcode 98, restarting after 10 
second sleep
2010/11/23 11:53:25 FATAL Listener: Can't create socket!
2010/11/23 11:53:25 FATAL Child exited with exitcode 98, restarting after 10 
second sleep
2010/11/23 11:53:35 FATAL Listener: Can't create socket!
2010/11/23 11:53:35 FATAL Child exited with exitcode 98, restarting after 10 
second sleep
2010/11/23 11:53:45 FATAL Listener: Can't create socket!
2010/11/23 11:53:45 FATAL Child exited with exitcode 98, restarting after 10 
second sleep
2010/11/23 11:53:55 FATAL Listener: Can't create socket!
2010/11/23 11:53:55 FATAL Child exited with exitcode 98, restarting after 10 
second sleep
2010/11/23 11:54:05 FATAL Listener: Can't create socket!
2010/11/23 11:54:05 FATAL Child exited with exitcode 98, restarting after 10 
second sleep
2010/11/23 11:54:15 FATAL Listener: Can't create socket!
2010/11/23 11:54:15 FATAL Child exited with exitcode 98, restarting after 10 
second sleep
2010/11/23 11:54:25 FATAL Listener: Can't create socket!
2010/11/23 11:54:25 FATAL Child exited with exitcode 98, restarting after 10 
second sleep
2010/11/23 11:54:35 FATAL Listener: Can't create socket!
2010/11/23 11:54:36 FATAL Child exited with exitcode 98 and has failed more 
than 10 times
"

after a clean stop of the mysql on this serveur and a reboot, everything was ok.

I thing that the launch process of the monitor should brought up that kind of 
error, without launching it in debug mode ...

Thanks for your reading.

Stéphane

Original comment by stephane...@gmail.com on 23 Nov 2010 at 1:39

GoogleCodeExporter commented 8 years ago

Original comment by ryan.a.l...@gmail.com on 8 Feb 2011 at 10:45

GoogleCodeExporter commented 8 years ago
I have MySQL MMM running in both test and production environment. It's been 
about 10 days in production. Have installed mysql-mmm-2.2.1-1.el5 supplied by 
EPEL repository. 
Few days ago, I experienced exactly the same issue and resolved it by rebooting 
the troubling server - as suggested above. The mysql-mmm-agent and the monitor 
went nuts after some brief glitch in the network setup - can't determine what 
it was and the host was up for 45 days.

Strangely, even restarting network didn't work and didn't help mysql-mmm-agent 
to start on that server.

If it happens again, I'll try resolving it without restarting and report back 
with the 'quickfix'.

Original comment by m.blaze...@alteatec.com on 7 Apr 2011 at 12:48