shinken-solutions / shinken

Flexible and scalable monitoring framework
http://www.shinken-monitoring.org
GNU Affero General Public License v3.0
1.13k stars 336 forks source link

Where does Shinken Livestatus API fetches host's and services status informations? #2018

Open sjose1x opened 2 years ago

sjose1x commented 2 years ago

How to identify from where the Shinken Livestatus API fetches the status information, and makes it available on the Thruk page

Currently I'm facing a problem, that is service state occasionally turns to PENDING.

Usually it happens only when a particular service has never been executed

For debugging it I checked in SQLite but could not see the status informations there..

In the documentation Its mentioned as, the live statuses for hosts and services are kept in the memory, but even after node reboot the statuses are reflecting properly at the Thruk portal, how..?

broker-master.cfg

 define broker {
    broker_name     broker-master
    address         localhost
    port            7772
    spare           0

    manage_arbiters     1   
    manage_sub_realms   1  
    timeout             3   
   data_timeout        120 
   max_check_attempts  3 
   check_interval      60

    modules    livestatus, celery-task, npcdmod

   use_ssl              0
   hard_ssl_name_check   0

  realm   All
  }

livestatus.cfg

define module {
    module_name     livestatus
    module_type     livestatus
    host            *          
    port            50000 
    modules         logstore-sqlite
}
geektophe commented 2 years ago

In fact, Livestatus doesn't fetches the information, it receives it.

The scheduler services have the most up to date view of the monitored infrastructure.

Each time an object gets modified (status changes, a check is executed, a notification is sent, and so on...), an event is generated and made available to the broker services.

The brokers periodically download those events (named broks in the Shinken terminology) and pushes them into the Livestatus service.

The Livestatus service in turn « rebuilds » its own view of the objects and updates it from the information it receives.

If Thruk indicates hosts or services are in PENDING state, it's probably due to schedulers not saving and restoring correctly the hosts and services states (retention data): the only moment an host or service is un PENDING state is when it's loaded from scratch (no retention data is available).

The documentation is right, Livestatus is totally stateless, all the data is stored in memory, and there's no disk retention.

Double check your schedulers retention module behavior.

I hope it helps.