internetarchive / dweb-mirror

Offline Internet Archive project
https://www-dweb-mirror.dev.archive.org/
GNU Affero General Public License v3.0
262 stars 27 forks source link

Related items calls being throttled #231

Open mitra42 opened 5 years ago

mitra42 commented 5 years ago

When fetching related items during a crawl, its getting 429 (?) errors, This is Gio's API throttling (reasonably) to protect the search. I should throw these into a queue or otherwise reschedule them.

mitra42 commented 5 years ago

There were two sets of throttling - Haproxy (Sam) and then Gio. Haproxy should be open now, which means will get an error from Gio if go over. We need to spot that error and requeue (in the crawler)

mitra42 commented 5 years ago

STR

mv /Volumes/x/archiveorg /Volumes/x/archiveorg-
internetarchive -sc  # in debugger
rm /Volumes/x/archiveorg/*/*related*
internetarchive -sc  # in debugger
 (if its doing a lot of thumbnail fetches then know that a 504 hit previously now worked, so repeat

Note - I couldnt repeat with these steps, might need to do on higher speed link at IA in which case maybe not an issue