Open jechols opened 6 months ago
Note that #25 would likely improve or eliminate this; if this by itself is a big task, a full refactor to the issue watcher may need to be prioritized.
Possible solution: use the "ingested" value in batches.json
to decide if we even want to read the data in a batch.\
src/chronam/json.go
: The BatchMetadata struct needs a new fieldsrc/issuefinder/web.go
: in FindWebBatches
, when looping over the batch metadata, skip any batch with too recent an ingest time - 24 hours or something?Downsides?
NCA makes an effort to be nearly real-time with its knowledge of the batches and issues that are live, on disk, in the NCA workflow, etc. But batches that are loaded to production can take up to a week to be reindexed. Sometimes they’ll be nearly instant, sometimes it’ll take a day, sometimes it’ll take a week…. The root problem is that NCA tries to index data when a batch load is going on at the same time, and gets a partial view of the batch data – which it then caches until the next full rebuild.
The good news is that NCA does a weekly rebuild of all cached data, so this kind of problem magically goes away. The bad news, of course, is that this is definitely unexpected (and therefore buggy) behavior.
NCA shouldn't cache batch data for batches that are in the process of loading. The problem is that this is not cached in such a straightforward manner: the caching occurs at the HTTP level, when fetching JSON from ONI. We just say "scan batches.json, then scan every batch URL that JSON file has in it." There's no direct tie to the batch being processed.