hoyishian / footballwebscraper

This is a web scraper that helps to scrape football data from FBRef.com. It can scrape data from the top 5 Domestic League games. It can be easily edited to scrape data from other leagues as well as from other competitions such as Champions League, Domestic Cup games, friendlies, etc.
17 stars 3 forks source link

Why do you use "sleep()" in your fbref_scout_extraction.py script before every "read_html()"? #13

Closed dkcracks closed 1 year ago

dkcracks commented 2 years ago

Big fan of your repos. Just a question regarding using "sleep()".

Thanks!

hoyishian commented 1 year ago

Thanks for the comment! When I first started on the project, fbref did not have a mechanism in place to prevent people from scraping data from their website.

In recent updates, fbref seems to limit the amount of traffic coming from a specific IP address at a given time (which I think was used to prevent webscraping or other kinds of attacks). I tried a variety of techniques including using sleep() to spread out the frequency of webscraping done per minute. I have not figured out a solution around this issue. If you have any suggestions, I am happy to listen to them!

dfelmanl commented 1 year ago

@hoyishian thanks for your answer and the repo. You can try using widgets