ukwa / ukwa-monitor

Dashboard and monitoring system for the UK Web Archive
0 stars 5 forks source link

Stats Pusher stops gathering metrics if one of them fails #55

Closed anjackson closed 1 year ago

anjackson commented 1 year ago

The current Stats Pusher stat_values.py calls sys.exit() if one of the HTTP calls fails. However, this kills the whole script, meaning that all subsequent metrics are no longer collected. This in turn means many false alerts get fired.

https://github.com/ukwa/ukwa-monitor/blob/cc9f9b0d26e0fad0b0202f9a488f6b1d0c698e40/stat-pusher/script/stat_values.py#L28-L37

The loop that goes through the checks should continue on to the next one if there is a problem. There is already a catch-all Exception handler at the loop level, so the best plan would seem to be to raise the Exception up the chain rather than swallow it locally.