roclark / sportsipy

A free sports API written for python
MIT License
475 stars 189 forks source link

HTTPError: HTTP Error 429: Too Many Requests #756

Open seanofthedead86 opened 1 year ago

seanofthedead86 commented 1 year ago

I'm getting a "HTTPError: HTTP Error 429: Too Many Requests" notification when running anything NFL from the API. I'm assuming this is being caused by sports-reference blocking me from making any requests on their page but wanted to see if anyone has had this issue before and if their is a way to resolve it.

Here is the error that's thrown:

HTTPError Traceback (most recent call last) in 1 Team_1 = 'ATL' 2 Team_2 = 'SDG' ----> 3 Total_1 = model.predict(model_input(Team_1, Team_2)) 4 # Total_2 = model.predict(model_input(Team_2, Team_1)) 5 # Total_1[0], Total_2[0], Total_1[0] - Total_2[0]

6 frames in model_input(home, away) 2 home_2018_schedule = team_schedule(home, 2018) 3 home_2019_schedule = team_schedule(home, 2019) ----> 4 home_2020_schedule = team_schedule(home, 2020) 5 home_2021_schedule = team_schedule(home, 2021) 6 home_2022_schedule = team_schedule(home, 2022)

in team_schedule(team, year) 1 def team_schedule (team, year): ----> 2 schedule = Schedule(team, year) 3 return schedule.dataframe.dropna() 4 5 def team_info (team):

/usr/local/lib/python3.7/dist-packages/sportsipy/nfl/schedule.py in init(self, abbreviation, year) 578 def init(self, abbreviation, year=None): 579 self._games = [] --> 580 self._pull_schedule(abbreviation, year) 581 582 def getitem(self, index):

/usr/local/lib/python3.7/dist-packages/sportsipy/nfl/schedule.py in _pull_schedule(self, abbreviation, year) 704 str(int(year) - 1))): 705 year = str(int(year) - 1) --> 706 doc = pq(SCHEDULE_URL % (abbreviation.lower(), year)) 707 schedule = utils._get_stats_table(doc, 'table#gamelog%s' % year) 708 if not schedule:

/usr/local/lib/python3.7/dist-packages/pyquery/pyquery.py in init(self, *args, kwargs) 183 html = opener(url, kwargs) 184 else: --> 185 html = url_opener(url, kwargs) 186 if not self.parser: 187 self.parser = 'html'

/usr/local/lib/python3.7/dist-packages/pyquery/openers.py in url_opener(url, kwargs) 74 def url_opener(url, kwargs): 75 if HAS_REQUEST: ---> 76 return _requests(url, kwargs) 77 return _urllib(url, kwargs)

/usr/local/lib/python3.7/dist-packages/pyquery/openers.py in _requests(url, kwargs) 59 if not (200 <= resp.status_code < 300): 60 raise HTTPError(resp.url, resp.status_code, ---> 61 resp.reason, resp.headers, None) 62 if encoding: 63 resp.encoding = encoding

HTTPError: HTTP Error 429: Too Many Requests

mattpfreer commented 1 year ago

I'm having the same issue with NHL and NFL, but NBA was working for me

seanofthedead86 commented 1 year ago

I'm having the same issue with NHL and NFL, but NBA was working for me

Yeah Im using NFL. I tried NBA and it seemed to work fine.

mattpfreer commented 1 year ago

Seems like SportsReference recently introduced a limit on requests to their site if I understand correctly? This may be the issue, but I had been using the same code past few days and after October 26th and was working fine

https://www.sports-reference.com/bot-traffic.html

seanofthedead86 commented 1 year ago

Seems like SportsReference recently introduced a limit on requests to their site if I understand correctly? This may be the issue, but I had been using the same code past few days and after October 26th and was working fine

https://www.sports-reference.com/bot-traffic.html

Same here but that's probably what's happening. Pretty much renders sportsipy useless.

mattpfreer commented 1 year ago

Going to try adding time.sleep(60) in between requests for roster/boxscore data, let me know if you have any success with workarounds as well

mattpfreer commented 1 year ago

@roclark let us know if you have any ideas as well

seanofthedead86 commented 1 year ago

@mattpfreer adding the timer worked!

mattpfreer commented 1 year ago

awesome to hear, did you add it before each single roster request or into the actual sportsipy py files? still having trouble on my end but have used a counter to try getting 10 at a time before doing sleep, seems like i may need to add this before each individual roster request

seanofthedead86 commented 1 year ago

awesome to hear, did you add it before each single roster request or into the actual sportsipy py files? still having trouble on my end but have used a counter to try getting 10 at a time before doing sleep, seems like i may need to add this before each individual roster request

I added it within a function I defined that calls Schedule and Team info several times for previous years and the current year for two separate teams. I think going forward I'm going to just save them as a csv file and just update the current year as needed.

mattpfreer commented 1 year ago

got it, thank you! hopefully there will be a resolution in the future but makes sense.

jrclegg2 commented 1 year ago

Yeah this is a problem rendering the API useless for my use case in NCAAB. I can't even run a Teams() call.

Any recommendations? I see the time.sleep option, but wow this will take some time to run for 300+ teams.

seanofthedead86 commented 1 year ago

Everything seemed to be working well the past few days, especially after adding in the timer in a few spots, but today I called Teams() and got the error. I hadn't run anything else.

wittwg commented 1 year ago

The Team function was working for me previous weeks, but it stopped yesterday. I tried running it on different IP addresses and different online notebooks and it doesn't work.

I am a novice, but could it be that someone is spamming NFL requests through the API which is causing the error?

seanofthedead86 commented 1 year ago

The Team function was working for me previous weeks, but it stopped yesterday. I tried running it on different IP addresses and different online notebooks and it doesn't work.

I am a novice, but could it be that someone is spamming NFL requests through the API which is causing the error?

Not sure. I experimented with NHL and CBB a little bit and they worked but at this point if the Teams function for NFL isn't going to work it's pretty much rendered the API useless for me.

@mattpfreer is it working for you?

mattpfreer commented 1 year ago

I was able to get NCAAB to work by editing the teams.py file and commenting out the code that uses Conferences data, seems like that is the only problematic piece

CorgPredicts commented 1 year ago

I was able to get NCAAB to work by editing the teams.py file and commenting out the code that uses Conferences data, seems like that is the only problematic piece

Would you be able to share what lines you commented out to get it to work? Thanks

bveber commented 1 year ago

The problem with Teams is it quickly fires off as many requests as there are teams, which almost immediately violates the new request limit at sports reference. I fixed it on my fork. You can see the change here. I just created a new utility function that adds a time.sleep after each url request via pyquery.

As long as you don't run multiple sportsipy commands in parallel this fix should guarantee you won't exceed the new limit. Obviously it's not ideal for NCAAB since fetching 300 teams will take ~15 minutes, but at least it works.

jrclegg2 commented 1 year ago

I mean I'm also having an issue to boxscores, which is more concerning.

My teams fix is a time.sleep(13) in _retrieve_all_teams, around line 1130.

I'm trying to train a model with boxscores back to '07. Querying each game individually with a 13sec sleep would take ~9 days...

Does anyone here happen to have a cache / boxscore info that goes back from last season to some year? Any help is much appreciated!!

jrclegg2 commented 1 year ago

Does anybody have a good solution for boxscore.py?

bveber commented 1 year ago

Does anybody have a good solution for boxscore.py?

The best way I've found to fix the it is to intentionally limit the rate of all calls to the website like I mentioned in my post above (check out this PR for specific details). I don't think your use case of bulk downloading data for several years at once is really an option anymore unless you're willing to wait a really long time for the results. You'll be better off scheduling a long running job and caching all the historical data you intend to re-use.

kankshat commented 1 year ago

Going to try adding time.sleep(60) in between requests for roster/boxscore data, let me know if you have any success with workarounds as well

Just ran into this problem as well. I created a model to predict the winning teams each week in the nfl and wanted to update my ultimate CSV and ran into this issue. The timesleep worked perfectly for me. I wish sports reference would just create an API directly