Improve the feed update metrics

jamespfennell commented 11 months ago

Transiter currently exposes a Promethous distribution metric transiter_feed_update_latency that reports how long it takes to do a feed update. We should add two more metrics:

transiter_feed_update_download_latency - time taken to download the data
transiter_feed_update_db_update_latency - time taken to run the DB updates

In particular, the second metric would give us insight into the probability that the feed update Postgres transactions for two feed updates overlap. Overlaps are bad because if there is are inconsistencies in the the update queries, Postgres will fail one of the transactions.

Thought of this while reviewing #110, in particular the part around racing updates

jamespfennell commented 11 months ago

Let's also simplify the transiter_feed_update_latency metric. It currently has a label based on the update result. But this means you can't do an apples-to-apples comparison of different metric values. For example, the values for DOWNLOAD_ERROR and SUCCESS will be wildly different because in the download error case we never even attempt to update the database. I would make this metric just for successful feeds.

Also may be nice to add the feed type in the metric.

jamespfennell commented 11 months ago

Also we should add a metric for the last time a feed was updated. This allows us to view the staleness of a feed over time. Could have a metric for last time successfully updates, last time update skipped, last time update failed.

jamespfennell / transiter

Improve the feed update metrics #111