Open seanofthedead86 opened 2 years ago
I'm having the same issue with NHL and NFL, but NBA was working for me
I'm having the same issue with NHL and NFL, but NBA was working for me
Yeah Im using NFL. I tried NBA and it seemed to work fine.
Seems like SportsReference recently introduced a limit on requests to their site if I understand correctly? This may be the issue, but I had been using the same code past few days and after October 26th and was working fine
Seems like SportsReference recently introduced a limit on requests to their site if I understand correctly? This may be the issue, but I had been using the same code past few days and after October 26th and was working fine
Same here but that's probably what's happening. Pretty much renders sportsipy useless.
Going to try adding time.sleep(60) in between requests for roster/boxscore data, let me know if you have any success with workarounds as well
@roclark let us know if you have any ideas as well
@mattpfreer adding the timer worked!
awesome to hear, did you add it before each single roster request or into the actual sportsipy py files? still having trouble on my end but have used a counter to try getting 10 at a time before doing sleep, seems like i may need to add this before each individual roster request
awesome to hear, did you add it before each single roster request or into the actual sportsipy py files? still having trouble on my end but have used a counter to try getting 10 at a time before doing sleep, seems like i may need to add this before each individual roster request
I added it within a function I defined that calls Schedule and Team info several times for previous years and the current year for two separate teams. I think going forward I'm going to just save them as a csv file and just update the current year as needed.
got it, thank you! hopefully there will be a resolution in the future but makes sense.
Yeah this is a problem rendering the API useless for my use case in NCAAB. I can't even run a Teams() call.
Any recommendations? I see the time.sleep option, but wow this will take some time to run for 300+ teams.
Everything seemed to be working well the past few days, especially after adding in the timer in a few spots, but today I called Teams() and got the error. I hadn't run anything else.
The Team function was working for me previous weeks, but it stopped yesterday. I tried running it on different IP addresses and different online notebooks and it doesn't work.
I am a novice, but could it be that someone is spamming NFL requests through the API which is causing the error?
The Team function was working for me previous weeks, but it stopped yesterday. I tried running it on different IP addresses and different online notebooks and it doesn't work.
I am a novice, but could it be that someone is spamming NFL requests through the API which is causing the error?
Not sure. I experimented with NHL and CBB a little bit and they worked but at this point if the Teams function for NFL isn't going to work it's pretty much rendered the API useless for me.
@mattpfreer is it working for you?
I was able to get NCAAB to work by editing the teams.py file and commenting out the code that uses Conferences data, seems like that is the only problematic piece
I was able to get NCAAB to work by editing the teams.py file and commenting out the code that uses Conferences data, seems like that is the only problematic piece
Would you be able to share what lines you commented out to get it to work? Thanks
The problem with Teams is it quickly fires off as many requests as there are teams, which almost immediately violates the new request limit at sports reference. I fixed it on my fork. You can see the change here. I just created a new utility function that adds a time.sleep after each url request via pyquery.
As long as you don't run multiple sportsipy commands in parallel this fix should guarantee you won't exceed the new limit. Obviously it's not ideal for NCAAB since fetching 300 teams will take ~15 minutes, but at least it works.
I mean I'm also having an issue to boxscores, which is more concerning.
My teams fix is a time.sleep(13)
in _retrieve_all_teams
, around line 1130.
I'm trying to train a model with boxscores back to '07. Querying each game individually with a 13sec sleep would take ~9 days...
Does anyone here happen to have a cache / boxscore info that goes back from last season to some year? Any help is much appreciated!!
Does anybody have a good solution for boxscore.py?
Does anybody have a good solution for boxscore.py?
The best way I've found to fix the it is to intentionally limit the rate of all calls to the website like I mentioned in my post above (check out this PR for specific details). I don't think your use case of bulk downloading data for several years at once is really an option anymore unless you're willing to wait a really long time for the results. You'll be better off scheduling a long running job and caching all the historical data you intend to re-use.
Going to try adding time.sleep(60) in between requests for roster/boxscore data, let me know if you have any success with workarounds as well
Just ran into this problem as well. I created a model to predict the winning teams each week in the nfl and wanted to update my ultimate CSV and ran into this issue. The timesleep worked perfectly for me. I wish sports reference would just create an API directly
I'm getting a "HTTPError: HTTP Error 429: Too Many Requests" notification when running anything NFL from the API. I'm assuming this is being caused by sports-reference blocking me from making any requests on their page but wanted to see if anyone has had this issue before and if their is a way to resolve it.
Here is the error that's thrown:
HTTPError Traceback (most recent call last) in
1 Team_1 = 'ATL'
2 Team_2 = 'SDG'
----> 3 Total_1 = model.predict(model_input(Team_1, Team_2))
4 # Total_2 = model.predict(model_input(Team_2, Team_1))
5 # Total_1[0], Total_2[0], Total_1[0] - Total_2[0]
6 frames in model_input(home, away)
2 home_2018_schedule = team_schedule(home, 2018)
3 home_2019_schedule = team_schedule(home, 2019)
----> 4 home_2020_schedule = team_schedule(home, 2020)
5 home_2021_schedule = team_schedule(home, 2021)
6 home_2022_schedule = team_schedule(home, 2022)
/usr/local/lib/python3.7/dist-packages/sportsipy/nfl/schedule.py in init(self, abbreviation, year) 578 def init(self, abbreviation, year=None): 579 self._games = [] --> 580 self._pull_schedule(abbreviation, year) 581 582 def getitem(self, index):
/usr/local/lib/python3.7/dist-packages/sportsipy/nfl/schedule.py in _pull_schedule(self, abbreviation, year) 704 str(int(year) - 1))): 705 year = str(int(year) - 1) --> 706 doc = pq(SCHEDULE_URL % (abbreviation.lower(), year)) 707 schedule = utils._get_stats_table(doc, 'table#gamelog%s' % year) 708 if not schedule:
/usr/local/lib/python3.7/dist-packages/pyquery/pyquery.py in init(self, *args, kwargs) 183 html = opener(url, kwargs) 184 else: --> 185 html = url_opener(url, kwargs) 186 if not self.parser: 187 self.parser = 'html'
/usr/local/lib/python3.7/dist-packages/pyquery/openers.py in url_opener(url, kwargs) 74 def url_opener(url, kwargs): 75 if HAS_REQUEST: ---> 76 return _requests(url, kwargs) 77 return _urllib(url, kwargs)
/usr/local/lib/python3.7/dist-packages/pyquery/openers.py in _requests(url, kwargs) 59 if not (200 <= resp.status_code < 300): 60 raise HTTPError(resp.url, resp.status_code, ---> 61 resp.reason, resp.headers, None) 62 if encoding: 63 resp.encoding = encoding
HTTPError: HTTP Error 429: Too Many Requests