statusengine / interface

AngularJS based Web Interface for Statusengine
https://statusengine.org/ui/#overview
GNU General Public License v3.0
18 stars 7 forks source link

Problem with old downtimes #42

Open duylong opened 4 years ago

duylong commented 4 years ago

Hi,

I have a problem with Upcoming downtimes. My old downtimes are still present with the status "Downtime currently running" and "Cancel" button.

upcoming-downtimes

Do you know why?

duylong commented 4 years ago

It seems that when worker is down, the worker does not detect the end downtime. The worker don't have a auto restart when everything goes wrong??

statusengine-worker[13]: Elasticsearch error!
statusengine-worker[13]: Elasticsearch error!
statusengine-worker[13]: No alive nodes found in your cluster
statusengine-worker[13]: No alive nodes found in your cluster
...
statusengine-worker[13]: Elasticsearch error!
statusengine-worker[13]: Elasticsearch error!
statusengine-worker[13]: No alive nodes found in your cluster
statusengine-worker[13]: No alive nodes found in your cluster

No more errors after a restarted service but the obsolete downtimes are still there..

Even if it's a worker problem, In any case the interface should still not display expired downtimes.

nook24 commented 4 years ago

This sounds interesting. Basically this is a worker related issue. How ever, the worker forks a MiscChild which will handle Notifications, Downtimes and Acknowledgements and a separate PerfdataChild which process performance data.

Normally whatever happens to the perfdata child should not cause any side effects to other processes / workers.

In any case the interface should still not display expired downtimes.

I will do some investigation into this.

duylong commented 4 years ago

Yes I noticed that I had no side effect with the worker errors. My problem is still present, I can't find the source of the problem to reproduce it. Currently I manually clean the MySQL database, it is not very clean.

nook24 commented 3 years ago

Is this still a thing?

duylong commented 3 years ago

I recently updated to the latest version, I am looking to see if the problem comes back or not ;-)

duylong commented 3 years ago

The command "/opt/statusengine/worker/bin/Console.php cleanup" should not clean up old ACK / DOWNTIME?

I still have DOWNTIME from "23:59 12.13.2020"...

Startusengine Cleanup started at: 2021-05-05 09:23:40
Delete old host records
Delete old host check records... done
Delete old host acknowledgements records... done
Delete old host notification records... done
Delete old host state history records... done
Delete old host downtime history records... done
Delete old service records
Delete old service check records... done
Delete old service acknowledgements records... done
Delete old service notification records... done
Delete old service state history records... done
Delete old service downtime history records... done
Delete old misc records
Delete old log entry records... done
Delete old task records... done
Delete old perfdata records for backend elasticsearch done
Cleanup took: 5 seconds...
Startusengine Cleanup finished at: 2021-05-05 09:23:45