usdigitalresponse / univaf

An API hosted by USDR for recording and querying vaccine appointment availability.
https://getmyvax.org/
Apache License 2.0
7 stars 2 forks source link

Add stale data warnings #1459

Closed Mr0grog closed 1 year ago

Mr0grog commented 1 year ago

The CVS issue in #1458 and other similar issues of API data freshness in the past have always been discovered by chance, and by us paying regular attention to the data. I was a little slow on catching CVS because the system has moved more towards pure maintenance, and that’s a problem. We need some alerts when data ceases to be fresh.

Places to check:

  1. The loader can check all the records it outputs and log issues. I like that this is close to where things happen, and is very immediate. OTOH, it could be too sensitive. Not sure. This also lets us summarize across all the data surfaced by a given source, and at a given time (doing the checks elsewhere might wind up including locations surfaced by that source in the past, but that are no longer surfaced by that source — e.g. this might be a big problem for Prepmod, which has a lot of one-off/two-off events).

  2. The server’s /api/edge/update endpoint could check every record it receives. We already have a freshness timeframe encoded in the server, and this puts these freshness checks in a nearby place. This is also just as immediate as the loader. OTOH, the server doesn’t know about a whole loader run, so this can’t summarize across a source (other than keeping a running average or something).

  3. A scheduled job (probably 1–4× /day) could calculate statistics across the whole database. It lets us do lots of summary statistics, but can't tell whether an availability record is stale because the source no longer outputs info for that location. It also needs configuration or coding to know which sources to configure (since some are deprecated and no longer in use). However, it can catch issues caused by the loader never running or never surfacing any data at all for a given source, which the other approaches would miss.

What constitutes “stale?”

How do we track it?

Mr0grog commented 1 year ago

Of existing, actively used sources, can we get freshness info (and how)?

So this is mostly good except: