theforeman / smart_proxy_monitoring

Smart proxy plugin for monitoring system integration
GNU General Public License v3.0
7 stars 10 forks source link

How is the "Monitoring Status" calculated? #24

Open kristofver opened 5 years ago

kristofver commented 5 years ago

Allthough in the host "Monitoring" tab all services are shown as OK, the "Monitoring Status" shows Critical. In Icinga2, everything is green. I'm using FQDNs both in Icinga and The Foreman. I can see 2 things that might influence this:

Any ideas on this? kind regards, Kristof

timogoebel commented 5 years ago

The monitoring status is actually calculated in the foreman, see https://github.com/theforeman/foreman_monitoring/blob/ef894d9c11bdaa4dd4693a758aedca6005f3066f/app/models/host_status/monitoring_status.rb#L12-L22.

kristofver commented 5 years ago

Given the to_status function, can u see a reason why a host with below monitoring results could still have Monitoring Status Critical?

foreman=# select * from monitoring_results where host_id = 15;
 id  | host_id |                service                 | result | downtime | acknowledged |         timestamp
-----+---------+----------------------------------------+--------+----------+--------------+----------------------------
 293 |      15 | mem                                    |      0 | f        | f            | 2019-01-15 08:59:38.675023
 176 |      15 | Host Check                             |      0 | f        | f            | 2019-01-15 09:00:28.794356
 190 |      15 | **redacted**                           |      0 | f        | f            | 2019-01-15 09:00:11.487482
 195 |      15 | **redacted**                           |      0 | f        | f            | 2019-01-15 09:00:26.921551
 221 |      15 | ping4                                  |      0 | f        | f            | 2019-01-15 08:59:56.651465
 232 |      15 | ntp                                    |      0 | f        | f            | 2019-01-15 09:00:28.403995
 234 |      15 | disk                                   |      0 | f        | f            | 2019-01-15 09:00:06.08392
 263 |      15 | ssh                                    |      0 | f        | f            | 2019-01-15 08:59:57.859941
 272 |      15 | load                                   |      0 | f        | f            | 2019-01-15 09:00:04.407032
(9 rows)
timogoebel commented 5 years ago

Can you use foreman-rake console and try the following?

Host::Managed.find_by(name: 'my-super.host.com').get_status(HostStatus::MonitoringStatus).to_status
kristofver commented 5 years ago

That returns 0.

I've put some debug statements in the to_status function but it doesn't get called when refreshing the host properties page in the foreman. The function to_label does get called and returns Critical.

I did some more troubleshooting. The HostStatus::MonitoringStatus record is a lot older than the monitoring_results records.

foreman=# select * from monitoring_results where host_id = 26;
 id  | host_id |                       service                        | result | downtime | acknowledged |         timestamp
-----+---------+------------------------------------------------------+--------+----------+--------------+----------------------------
 286 |      26 | load                                                 |      0 | f        | f            | 2019-01-15 08:59:49.232184
 298 |      26 | disk                                                 |      0 | f        | f            | 2019-01-15 08:59:33.737705
 175 |      26 | Host Check                                           |      0 | f        | f            | 2019-01-15 09:00:24.100271
 192 |      26 | ***                                                  |      0 | f        | f            | 2019-01-15 09:00:17.077651
 196 |      26 | ***                                                  |      0 | f        | f            | 2019-01-15 08:59:59.033319
 215 |      26 | ntp                                                  |      0 | f        | f            | 2019-01-15 09:00:28.406188
 264 |      26 | ping4                                                |      0 | f        | f            | 2019-01-15 08:59:54.60494
 265 |      26 | mem                                                  |      0 | f        | f            | 2019-01-15 09:00:15.267184
 274 |      26 | ssh                                                  |      0 | f        | f            | 2019-01-15 08:59:45.435173
(9 rows)

foreman=# select * from host_status where host_id = 26;
 id  |               type                | status | host_id |        reported_at
-----+-----------------------------------+--------+---------+----------------------------
 365 | ForemanOpenscap::ComplianceStatus |      0 |      26 | 2019-01-15 09:01:12.734863
 359 | Katello::PurposeUsageStatus       |      0 |      26 | 2019-01-15 09:01:12.851821
 345 | HostStatus::BuildStatus           |      0 |      26 | 2019-01-15 09:01:12.729576
 373 | HostStatus::ExecutionStatus       |      0 |      26 | 2019-01-15 09:01:12.755233
 358 | Katello::PurposeRoleStatus        |      0 |      26 | 2019-01-15 09:01:12.82413
 363 | HostStatus::ConfigurationStatus   |      0 |      26 | 2019-01-16 18:33:42
 455 | HostStatus::MonitoringStatus      |      2 |      26 | 2019-01-13 11:12:31.693512
 356 | Katello::ErrataStatus             |      0 |      26 | 2019-01-15 09:01:12.759535
 355 | Katello::SubscriptionStatus       |      0 |      26 | 2019-01-15 09:01:12.767729
 357 | Katello::PurposeSlaStatus         |      0 |      26 | 2019-01-15 09:01:12.795652
 360 | Katello::PurposeAddonsStatus      |      0 |      26 | 2019-01-15 09:01:12.879249
 361 | Katello::PurposeStatus            |      0 |      26 | 2019-01-15 09:01:12.906951
 456 | Katello::TraceStatus              |      0 |      26 | 2019-01-15 09:01:12.911751

I then did a reboot of the monitored server. The monitoring_results records where updated, but the HostStatus::MonitoringStatus record is still the old one from 2019-01-13.

foreman=# select * from monitoring_results where host_id = 26;
 id  | host_id |                       service                        | result | downtime | acknowledged |         timestamp
-----+---------+------------------------------------------------------+--------+----------+--------------+----------------------------
 286 |      26 | load                                                 |      0 | f        | f            | 2019-01-16 19:02:44.485407
 298 |      26 | disk                                                 |      0 | f        | f            | 2019-01-16 19:02:44.50284
 175 |      26 | Host Check                                           |      0 | f        | f            | 2019-01-16 19:02:32.77024
 196 |      26 | ***                                                  |      0 | f        | f            | 2019-01-16 19:02:42.91751
 264 |      26 | ***                                                  |      0 | f        | f            | 2019-01-16 19:02:33.475878
 265 |      26 | mem                                                  |      0 | f        | f            | 2019-01-16 19:02:43.258151
 274 |      26 | ssh                                                  |      0 | f        | f            | 2019-01-16 19:02:28.217909
 192 |      26 | ***                                                  |      0 | f        | f            | 2019-01-15 09:00:17.077651
 215 |      26 | ntp                                                  |      0 | f        | f            | 2019-01-15 09:00:28.406188
(9 rows)

foreman=# select * from host_status where host_id = 26;
 id  |               type                | status | host_id |        reported_at
-----+-----------------------------------+--------+---------+----------------------------
 365 | ForemanOpenscap::ComplianceStatus |      0 |      26 | 2019-01-15 09:01:12.734863
 359 | Katello::PurposeUsageStatus       |      0 |      26 | 2019-01-15 09:01:12.851821
 345 | HostStatus::BuildStatus           |      0 |      26 | 2019-01-16 18:23:30.864883
 373 | HostStatus::ExecutionStatus       |      0 |      26 | 2019-01-15 09:01:12.755233
 358 | Katello::PurposeRoleStatus        |      0 |      26 | 2019-01-15 09:01:12.82413
 363 | HostStatus::ConfigurationStatus   |      2 |      26 | 2019-01-16 19:02:18
 455 | HostStatus::MonitoringStatus      |      2 |      26 | 2019-01-13 11:12:31.693512
 356 | Katello::ErrataStatus             |      0 |      26 | 2019-01-15 09:01:12.759535
 355 | Katello::SubscriptionStatus       |      0 |      26 | 2019-01-15 09:01:12.767729
 357 | Katello::PurposeSlaStatus         |      0 |      26 | 2019-01-15 09:01:12.795652
 360 | Katello::PurposeAddonsStatus      |      0 |      26 | 2019-01-15 09:01:12.879249
 361 | Katello::PurposeStatus            |      0 |      26 | 2019-01-15 09:01:12.906951
 456 | Katello::TraceStatus              |      0 |      26 | 2019-01-15 09:01:12.911751
(13 rows)

Any clues? When would the host_status record normally be updated? Is this something the monitoring smart proxy does?

BTW, thanks for your support!

timogoebel commented 5 years ago

I suspect that the status is not properly refreshed. Can you try if this helps?

Host::Managed.find_by(name: 'my-super.host.com').get_status(HostStatus::MonitoringStatus).refresh!
timogoebel commented 5 years ago

btw: 0 means ok as a result, see https://github.com/theforeman/foreman_monitoring/blob/9cecbc007bef2bf6a8f839d1cda2cc017c20ad45/app/models/host_status/monitoring_status.rb#L3-L6.