statusengine / interface

AngularJS based Web Interface for Statusengine
https://statusengine.org/ui/#overview
GNU General Public License v3.0
18 stars 7 forks source link

Reschedule Service/Host Check Does Nothing #24

Closed steaksauce- closed 5 years ago

steaksauce- commented 5 years ago

When rescheduling service/host checks, I notice that nothing happens. I can verify this by checking the history in the UI and also checking the Naemon logs.

Using the developer console in my browser, I notice that I get the following entry when I reschedule a service check:

Possibly unhandled rejection: cancel
(anonymous) @ angular.min.js:122
(anonymous) @ angular.min.js:94
g @ angular.min.js:133
$eval @ angular.min.js:147
$digest @ angular.min.js:144
$apply @ angular.min.js:148
(anonymous) @ angular.min.js:282
dispatch @ jquery.min.js:3
q.handle @ jquery.min.js:3
nook24 commented 5 years ago

checking the Naemon logs.

Is log_external_commands=1 enabled in your naemon.cfg? If yes, you should see rescheduling of host/services:

[1561714876] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;demo.statusengine.org;Flapping;1561714860

[1561714988] EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_CHECK;foobar;1561714984

If you haven't disabled processing of logentries (it's enabled by default) you should also see the messages in Statusengine Ui:

Bildschirmfoto 2019-06-28 um 11 44 33

Unfortunately the error you posted isn't super helpful. I recommend to use the Firefox dev tools to check if the server response is something else than 200 Ok:

Bildschirmfoto 2019-06-28 um 11 47 16

Make sure XHR is enabled in the Console.

Basically external commands are working on the Demo System. Maybe just an configuration issue of your web server or so...

steaksauce- commented 5 years ago

I have the logs enabled as noted above -- I checked them to verify that nothing was happening. XHR shows 200 return codes, but it still does nothing (naemon logs show no commands being sent and SE history show no commands being sent).

What web server configuration would affect this?

nook24 commented 5 years ago

XHR shows 200 return codes, but it still does nothing (naemon logs show no commands being sent and SE history show no commands being sent).

Sounds like your SE Worker is not routing external commands to Naemon Core.

This could be caused by three issues:

  1. You have set check_for_commands: 0 in config.yml of the Statusengine Worker or passed a wrong path of naemon.cmd/naemon.qh.

  2. The user you use to run Statusengine Worker has no write permissions to naemon.cmd/naemon.qh.

  3. Something is wrong with your Cluster Setup. Can you please query your database to see if all the old external commands are stuck in the database?

mysql> select * from statusengine_tasks;
+--------------------------------------+-------------------+------------+-----------------+--------------------------------------------------------------------------+
| uuid                                 | node_name         | entry_time | type            | payload                                                                  |
+--------------------------------------+-------------------+------------+-----------------+--------------------------------------------------------------------------+
| 818858d2-e01d-40a2-9b4b-e48c9c7f4c63 | Statusengine-Demo | 1562080906 | externalcommand | [1562080906] SCHEDULE_FORCED_SVC_CHECK;localhost;Current Load;1562080906 |
| 677cfef6-642e-46ff-bba8-5f670732e9b4 | Statusengine-Demo | 1562080910 | externalcommand | [1562080910] SCHEDULE_FORCED_HOST_CHECK;localhost;1562080910             |
+--------------------------------------+-------------------+------------+-----------------+--------------------------------------------------------------------------+
2 rows in set (0.00 sec)
steaksauce- commented 5 years ago

Sure enough, the default values for external_command_file and query_handler were not the correct values for my setup. In CentOS 7, if you install Naemon from the lab OMD repo, the values should be /var/lib/naemon/naemon.cmd and /var/lib/naemon/naemon.qh, respectively. I changed the values in my worker config, and restarted the worker and it seems to be working.