Closed dhoffend closed 8 years ago
Under certain conditions (for example reboot) the statusengine process comes up without parent pid / master process
See example here:
root 1520 1 0 17:11 ? 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 1521 1 0 17:11 ? 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 1522 1 0 17:11 ? 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 1523 1 0 17:11 ? 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 1524 1 0 17:11 ? 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
Okay found a problem It seems if the mysql connection cannot be established, all the worker processes are already spawned but the main process crashes with ERROR
Okay, the bug is not the init scripts, it's the statusegine application leaving childs/worker behind if the main process exits
You can see here that statusengine is not running
# ps -ef | grep statusengine
root 4663 2063 0 17:56 pts/0 00:00:00 grep --color=auto statusengine
I'm shutting down the mysql-server and even delete the socket file (yes I know that's not nice).
# /etc/init.d/mysql stop
* Stopping MySQL database server mysqld [ OK ]
# rm /var/run/mysqld/mysqld.sock
When we now start the statusengine application it will crash with an error that mysql connection can't be established. (Maybe there are also other cases where statusengine might crash on startup)
# /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
Welcome to CakePHP v2.5.4 Console
---------------------------------------------------------------
App : app
Path: /opt/statusengine/cakephp/app/
---------------------------------------------------------------
Starting Statusengine version: 1.5.1...
THIS IS LEGACY MODE!
Error: Database connection "Mysql" is missing, or could not be created.
#0 /opt/statusengine/cakephp/lib/Cake/Model/Datasource/DboSource.php(260): Mysql->connect()
#1 /opt/statusengine/cakephp/lib/Cake/Model/ConnectionManager.php(105): DboSource->__construct(Array)
#2 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(3524): ConnectionManager::getDataSource('legacy')
#3 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1136): Model->setDataSource('legacy')
#4 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(3551): Model->setSource('instances')
#5 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1369): Model->getDataSource()
#6 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1553): Model->schema()
#7 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(2333): Model->create()
#8 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(2672): StatusengineLegacyShell->createInstance()
#9 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(210): StatusengineLegacyShell->forkWorker()
#10 /opt/statusengine/cakephp/lib/Cake/Console/Shell.php(440): StatusengineLegacyShell->main()
#11 /opt/statusengine/cakephp/lib/Cake/Console/ShellDispatcher.php(209): Shell->runCommand('-w', Array)
#12 /opt/statusengine/cakephp/lib/Cake/Console/ShellDispatcher.php(66): ShellDispatcher->dispatch()
#13 /opt/statusengine/cakephp/app/Console/cake.php(47): ShellDispatcher::run(Array)
#14 {main}
After the main script has crashed you can check your process list and find many orphanded statusengine processes without parent-pid (aka master) idling around.
# ps -ef | grep -v grep | grep statusengine
root 5013 1 0 17:58 pts/0 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 5014 1 0 17:58 pts/0 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 5015 1 0 17:58 pts/0 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 5016 1 0 17:58 pts/0 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 5017 1 0 17:58 pts/0 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
Yes thats right, if the parent process of statusengine crashs, the child processes are orphanded. You need to kill them manually :/
It looks like, that your MySQL credentials are wrong in https://github.com/nook24/statusengine/blob/master/cakephp/app/Config/database.php#L78-L86 Have you tried, if your are able to login with the given username and password?
mysql -u statusengine -p
Enter password:
mysql> use naemon
Database changed
mysql> select current_state from naemon_servicestatus LIMIT 0, 1;
+---------------+
| current_state |
+---------------+
| 3 |
+---------------+
1 row in set (0.00 sec)
mysql>
If this is working, your user has the permissions for the database. May be you are using nagios instead of naemon as database name...
Please make sure, that you are using the current database schema: https://github.com/nook24/statusengine/blob/master/cakephp/app/Plugin/Legacy/Config/Schema/legacy_schema_innodb.php You can update/create your database for Statusengine by executing:
/opt/statusengine/cakephp/app/Console/cake schema update --plugin Legacy --file legacy_schema_innodb.php --connection legacy
To debug Statusengine, you should run it in foreground mode using:
/opt/statusengine/cakephp/app/Console/cake statusengine_legacy -w
I guess you did this to get the error message you pasted above :)
At the moment Statusengine only silently catches the MySQL error "General error: 2006 MySQL server has gone away" https://github.com/nook24/statusengine/blob/master/cakephp/app/Plugin/Legacy/Model/LegacyAppModel.php#L198-L203 to reconnect if you may be restart your MySQL server. On every other MySQL error Statusengine will crash :(
Please check that there are no orphanded child processes are running on your system if you fire up Statusengine
In this case it was a cluster configuration where statusengine start/stop/status is controlled via pacemaker. And we found a situation where orphaned statusengine childs were left behind.
Since the init-script doesn't do any additional checks for orphaned childs and the childs themself don't care if the master connection is lost I opened this issue.
IMO the childs should be killed before a hard exit (catched exception) or the childs should kill themself if the connection to it's master process is lost. Just leaving them behind can create additional problems.
Since the init-script doesn't do any additional checks for orphaned childs and the childs themself don't care if the master connection is lost I opened this issue.
Thats true, let me check if i can hack a quick fix for you, sounds not hard to check if the parent process is dead or not...
Another case would be during startup where the scripts are started in the wrong way (statusengine before MySQL).
I patched Statusengine right now and it looks not to bad for the first try:
root@statusengine-demo:/opt/statusengine/cakephp/app# service statusengine start
Starting Statusengine
Check that everything is running and happy:
root@statusengine-demo:/opt/statusengine/cakephp/app# ps -eaf |grep status
root 10689 10656 0 18:37 pts/4 00:00:00 tailf /var/log/statusengine.log
root 10835 10707 0 18:37 pts/5 00:00:00 watch gearadmin --status
root 11266 1 7 18:40 ? 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 11267 11266 2 18:40 ? 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 11268 11266 1 18:40 ? 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 11269 11266 1 18:40 ? 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 11270 11266 1 18:40 ? 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 11271 11266 0 18:40 ? 00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root 11279 10077 0 18:40 pts/1 00:00:00 grep --color=auto status
Kill the parent process and see what happens:
root@statusengine-demo:/opt/statusengine/cakephp/app# kill -9 11266; sleep 1; ps -eaf |grep status
root 10689 10656 0 18:37 pts/4 00:00:00 tailf /var/log/statusengine.log
root 10835 10707 0 18:37 pts/5 00:00:00 watch gearadmin --status
root 11319 10077 0 18:40 pts/1 00:00:00 grep --color=auto status
/var/log/statusengine.log reports now:
[11270] My parent process is gone I guess I am orphaned and will exit now!
[11268] My parent process is gone I guess I am orphaned and will exit now!
[11271] My parent process is gone I guess I am orphaned and will exit now!
[11269] My parent process is gone I guess I am orphaned and will exit now!
[11267] My parent process is gone I guess I am orphaned and will exit now!
Check the init script:
root@statusengine-demo:/opt/statusengine/cakephp/app# service statusengine status
No /usr/bin/php5 found running; none killed.
Statusengine is not running
You can test this on your environment if you checkout the branch: https://github.com/nook24/statusengine/tree/issue_14
Unfortunately my init script is not the best and need to be ported to systemd i guess... But pacemaker should launch all your processes, not SysVinit.
Looks good so far. Now I just have to get the right starting order so it doesn't crash on startup.
Btw. The fact that the master process has died is not visible in the logfile. You only see this if you start it manually. If you start it via init script and the process dies due to unavailable database connectivity the start-stop-daemon doesn't say anything or writes anything to the log.
I would suggest if you don't catch the exception to do something with it that at least stuff gets logged so it can be reviewed if something happens.
I add the boot order to the documentation: https://statusengine.org/documentation.php#boot-order
I will check if i can add the errors to the logfile...
@dhoffend can you please checkout my last commit SHA: 1e7b44b27fbf226652dd32097bf4fa21f241a5fd ? I guess this is what you are looking for.
Looks better, but now the error output on the commandline is missing. Since it's a critical error it should be printed to STDERR.
Here's the log output
exception 'MissingConnectionException' with message 'Database connection "Mysql" is missing, or could not be created.' in /opt/statusengine/cakephp/lib/Cake/Model/Datasource/Database/Mysql.php:191
Stack trace:
#0 /opt/statusengine/cakephp/lib/Cake/Model/Datasource/DboSource.php(260): Mysql->connect()
#1 /opt/statusengine/cakephp/lib/Cake/Model/ConnectionManager.php(105): DboSource->__construct(Array)
#2 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(3524): ConnectionManager::getDataSource('legacy')
#3 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1136): Model->setDataSource('legacy')
#4 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(3551): Model->setSource('instances')
#5 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1369): Model->getDataSource()
#6 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1553): Model->schema()
#7 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(2331): Model->create()
#8 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(2670): StatusengineLegacyShell->createInstance()
#9 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(208): StatusengineLegacyShell->forkWorker()
#10 /opt/statusengine/cakephp/lib/Cake/Console/Shell.php(440): StatusengineLegacyShell->main()
#11 /opt/statusengine/cakephp/lib/Cake/Console/ShellDispatcher.php(209): Shell->runCommand('-w', Array)
#12 /opt/statusengine/cakephp/lib/Cake/Console/ShellDispatcher.php(66): ShellDispatcher->dispatch()
#13 /opt/statusengine/cakephp/app/Console/cake.php(47): ShellDispatcher::run(Array)
#14 {main}[5114] My parent process is gone I guess I am orphaned and will exit now!
[5115] My parent process is gone I guess I am orphaned and will exit now!
[5116] My parent process is gone I guess I am orphaned and will exit now!
[5117] My parent process is gone I guess I am orphaned and will exit now!
[5118] My parent process is gone I guess I am orphaned and will exit now!
I'm wondering why do you try to create a new log handler/resource? Can't you use your Console/Command/Task/LogfileTask.php logger?
For logging stuff I'm opening a different issue with other things in mind.
I'm wondering why do you try to create a new log handler/resource? Can't you use your Console/Command/Task/LogfileTask.php logger?
My plan is to remove this class...
If you want i can enable stderr output as well
Merged into new Version 1.6.0
When using the provided start script, several worker process are being started and forked into background without any real parent pid.
start-stop-daemon creates a pid file of the main php script which then gets closed after all worker processes are being created.
When calling the init script with the status parameter you get an error that statusengine is not running cause the pid file created by start-stop-daemon has already ended but the forked childs still live on.
When you try to stop statusengine it tells you that statusengine has already been stopped (the process to the pid file .... same as status).