openzim / wp1

Wikipedia 1.0 engine & selection tools
https://wp1.openzim.org
GNU General Public License v2.0
24 stars 17 forks source link

Monitor the solution #121

Open kelson42 opened 5 years ago

kelson42 commented 5 years ago

We have already a minimal monitoring (using uptimerobot) based on the simple availability of a web page. But using this we can not detect:

For this we need a solution able to monitor the logs of the applicationn.

kelson42 commented 5 years ago

Something like https://icinga.com/

audiodude commented 4 years ago

I've added uptimerobot for the new wp1.openzim.org URL.

audiodude commented 4 years ago

We could run a cron job on the workers image that checks how many items are in the FAILED queue and if it gets over N sends us an email?

kelson42 commented 4 years ago

@audiodude Thank you for commenting.I think there is many things to add on this tickets on my side.

First, we have a Kiwix uptimerobot.com account (was a good advice of you) entry for wp1.openzim.org. I believe this is important the monitoring is centralised for our services. If OK for you I would like to add you as recipient.

Then, I think the problem this ticket is complex enough to have a multilateral technical answer:

audiodude commented 4 years ago

Okay at the very least I'll delete my uptimerobot entry and you can add me to yours.