recursecenter / pairing-bot

A Zulip bot that partners people for pair programming practice
MIT License
22 stars 15 forks source link

Alert on error #61

Open cceckman opened 3 months ago

cceckman commented 3 months ago

(As noted in this issue: )

On significant / "should never happen" errors, it would be nice to alert the maintainers; e.g. "failed end-of-batch processing", or "couldn't send a message". Manually checking logs is not the best observability experience. :)

jdkaplan commented 1 month ago

I'm used to doing log-based alerting based on severity level, which I think GCP supports, so I'm setting that up in #95.

The next step is to start sending errors to myself by clicking through all that config :)

jdkaplan commented 1 month ago

Wait, GitHub, too fast! We still have to configure the alerts!

jdkaplan commented 1 month ago

I finally followed the guide for configuring a log-based alert, and now we have this alert policy for any log line with severity >= ERROR.

This should support multiple notification channels, so anyone with GCP access can add a new one and append it to the list for this alert.

The next step is to arrange for "things that shouldn't fail" to emit ERROR-level log messages. I'm preparing a PR that happens to arrange that for all the cron job handlers, so that should be solved soon :slightly_smiling_face: