Open mohierf opened 9 years ago
NOTE still some fixes to be made ... do not use on production servers !
To be sure I understand well. This collection is updated every time the mongo-logs get a new log for the hostname/service. So "period for UNCHECKED" is initialized to 86400, and decremented when we increment the others values? Am I right?
So when we query the availability from the WebUI, we only compute percentages of 86400?
Is it computing availability for all services, or only for hosts?
You are right ... it is almost a real time information :-)
At the moment, I only implemented host checks but it will be reaaly simple to make it for all services.
I noticed some problems with this simple strategy :
you can not have availability information for periods smaller than a day
I have some ideas to cope with the first problem ... but I am not yet sure what is the best strategy ... to be discussed! @maethor
I plan to review entirely the source code of you plugin (to remove some if len(list) > 0:
, for example :D), so in a few hours I will be happy to bring you some suggestion on the strategy :)
Availability for small period is quite hard. In fact, the best strategy to manage such things is the one used by perfdata databases. It consists in having precise information for the last hours, and then to aggregate the information more and more as the time goes. This is nice because we don't have to put any limit, and we are sure that the database size will not explode. But on the other hand, it can complexify a lot the implementation.
But I think I already have an idea to do this… :)
Feel free to restart from scratch ...I simply made a moke-up to validate an idea that was to compute on the fly instead of parsing a big logs table in a database :-)
There is no need to restart from scratch. Your proof of concept is great :)
What is the status of this feature? I see that building from latest that there is still no service-based availability in my mongo log. I am somewhat keen on implementing this. @mohierf, @maethor: any ideas/thoughts you want to share?
@bittrance : as far as I remember (it's been quite a long time ...), you should have information for the hosts and the services.
The module log some information on start in the brokerd.log to inform about what it will manage. And you have some configuration parameters to include/exclude some services from the recording ... perharphs something to configure on your environment ?
I left this issue opened because @maethor had an idea for rewriting some part of the code.
Indeed. Explicitly setting a serivces_filter resolves the issue. The text in the module config file says "default is to consider only the services which business impact is > 4". However, since services_filter is commented out in default config, https://github.com/shinken-monitoring/mod-mongo-logs/blob/master/module/module.py#L154 will actually leave filter_service_criticality unset, which means https://github.com/shinken-monitoring/mod-mongo-logs/blob/master/module/module.py#L373 will be bypassed. Which is right? should the default be services_filter = getattr(mod_conf, 'services_filter', 'bi:>=4')
or should the docs in config file change?
Because services_filter
is commented out, it takes the default value defined in the source code and it is ... an empty string :(
You are right, we should change the doc in the configuration file !
NOTE still some fixes to be made ... do not use on production servers !
The module manage _host_checkresult broks to compute and store availability data for all known hosts on a daily basis.
For every day, a document is stored in the availability collection with following fields :
The sum of the 5 stored periods is always 86400, as the number of seconds per day. Before the first received check, the host is considered as in an UNCHECKED period, as well as after the last received check.
The Shinken WebUI uses this data collection to display availability information for each host (see https://github.com/shinken-monitoring/mod-webui/issues/260).