vishaalagartha / basketball_reference_scraper

A python module for scraping static and dynamic content from Basketball Reference.
MIT License
254 stars 91 forks source link

Add `get_wrapper` to prevent 429 errors to prevent scraping #98

Closed vishaalagartha closed 7 months ago

vishaalagartha commented 1 year ago

[What] Introduce get_wrapper to handle when there is a Retry-After in the response header. Allows scraping to continue after 429 errors

[How] Add sleep functionality to pause execution.

vishaalagartha commented 1 year ago

@amywinecoff can you review this PR and see if this fix would resolve the 429 errors? Asking because you seem to have run into this issue in the past.

tgracin commented 1 year ago

I did it a bit differently (I maintained session and used exponential backoff), but your version with retries after the suggested time from header works as well.

My commit: https://github.com/vishaalagartha/basketball_reference_scraper/commit/830f76a9999e613b4138eb376387df3f1b52428a