rfcx / arbimon

Ecoacoustic analysis platform empowering conservationists to analyze acoustic data and to derive insights about the ecosystem at scale
https://arbimon.org
Apache License 2.0
0 stars 1 forks source link

Create alert when RFM and soundscape jobs are stuck after 6 hours #2038

Closed koonchaya closed 1 week ago

koonchaya commented 2 weeks ago

When user created jobs (soundscapes or RFM) and jobs stop running or not started for 6 hours, we should have alert (grafana logs) to show up in arbimon-dev channel. So we know that we have jobs stuck in queue.

rassokhina-e commented 1 week ago

a new alert was created with the title - [Alert] Arbimon Jobs server's stuck check which is checking the jobs queue every 2 hours to detect empty progress during this time for the waiting and processing jobs

Image

Image

koonchaya commented 1 week ago

Where do we see this alert? Will we get notification somewhere?

rassokhina-e commented 1 week ago

yes, we got test alert in the slack channel

Image