Closed j2kun closed 3 years ago
Maybe try alerta?
https://docs.alerta.io/en/latest/quick-start.html https://github.com/alerta/docker-alerta
Seems free, could deploy as an HTTP server on the generator server, since that one generally doesn't do much. Has a docker container ready.
There's one called Riemann. So tempting
I think I'll try something simpler first: just write a script that runs docker ps -a
and parses the output, sends an email if anything died.
Create an app password through Google Account (for gmail) install ssmtp configure (/etc/smtp/smtp.conf) use environment variables to hide secrets python script to run the monitor, nohup to detach it from the terminal
Set up the monitoring on each of the four EC2 instances. The processor nodes had ran out of RAM, so I restarted them and expect them to fail again soon, which I can then use to test the alerting system.
Looks like the processor jobs are still going after restart, so something unexpected caused them to fail... not sure, but will close this for now and see if the alerting works later
I'm not quite sure how I want to do this yet, but creating this issue as a placeholder.