mediacloud / story-indexer

The core pipeline used to ingest online news stories in the Media Cloud archive.
https://mediacloud.org
Apache License 2.0
2 stars 5 forks source link

better monitoring for rss-puller (and all Producer/Queuer classes) #276

Open philbudne opened 7 months ago

philbudne commented 7 months ago

Since we're no longer always/only fetching RSS files a fixed interval in the past (two days ago), it would be nice to have a way to see the date of the most recently fetched story that has been queued.

Grafana appears to have axis labeling options for date/times: Investigate what it wants: decimal numbers like YYYYMMDD or epoch seconds (since 1970-01-01)?

Investigate:

philbudne commented 7 months ago

Footnote:

I suppose min/max date metrics could be reported for ALL Workers (gathering the info in StorySender.send_story), which would be useful if multiple queues were congested, so you could see what range of dates were being handled where in the pipeline?!