Open jgalar opened 1 year ago
I'm wondering if we want to make scrapings "resumable". Either we save enough context to take up where we left off or we attempt to only "refresh" the SKUs/categories that weren't scraped for a long while.
I think the second option is easier to implement. Keep the "last refresh date / time" information, and start with those that haven't been successfully refreshed for the longest time. It would be nice to have regardless.
Otherwise we can do something simpler and just retry with some kind of exponential back-off until we see the service is back online.
We have to give up at some point though. If for instance our requests need to be updated, it may never start working again. In that case, we would want to stop the whole run.
I'm also wondering if some specific product or SKU is causing a server-side error. In that case, it may never start working, even it we wait. In that case we would want to skip over it and continue the run.
Whatever solution I can think of, I can also think of a pathological case that will make that solution not ideal. I think we'll need to experiment and see what works best. Maybe something like "if we see that 5 calls (e.g. to SkusInventory.iter) in a row have failed, then abort the run, because something appears to be seriously off". Combined with a way to find out if some specific products or SKUs appears to always cause a failure.
Regarding the second error you pasted:
Failed to establish a new connection: [Errno -2] Name or service not known'))
That does sound more like a client problem than a server problem. Not sure why it would happen, but it looks different than the first one.
I tried a scrape-prices
, I also see a 502, so it's probably a systematic issue.
Could you consider reviewing and merging PR #55 before we look at this? Getting type check to work will make every subsequent change easier.
Lately, price scraping jobs have failed to complete because of various connection errors; notably
502
errors.SKU scrapings also fail for similar reasons.
I'm wondering if we want to make scrapings "resumable". Either we save enough context to take up where we left off or we attempt to only "refresh" the SKUs/categories that weren't scraped for a long while.
Otherwise we can do something simpler and just retry with some kind of exponential back-off until we see the service is back online.