Why do you use "sleep()" in your fbref_scout_extraction.py script before every "read_html()"?

hoyishian / footballwebscraper

This is a web scraper that helps to scrape football data from FBRef.com. It can scrape data from the top 5 Domestic League games. It can be easily edited to scrape data from other leagues as well as from other competitions such as Champions League, Domestic Cup games, friendlies, etc.

17 stars 3 forks source link

Thanks for the comment! When I first started on the project, fbref did not have a mechanism in place to prevent people from scraping data from their website.

In recent updates, fbref seems to limit the amount of traffic coming from a specific IP address at a given time (which I think was used to prevent webscraping or other kinds of attacks). I tried a variety of techniques including using sleep() to spread out the frequency of webscraping done per minute. I have not figured out a solution around this issue. If you have any suggestions, I am happy to listen to them!

hoyishian / footballwebscraper

Why do you use "sleep()" in your fbref_scout_extraction.py script before every "read_html()"? #13