mjk2244 / pro-football-reference-web-scraper

Web scraper to retrieve player and team data from Pro Football Reference.
https://mjk2244.github.io/pro-football-reference-web-scraper/
Apache License 2.0
26 stars 15 forks source link

Sample Code given does not even work #53

Open mrozelsk opened 9 months ago

mrozelsk commented 9 months ago

I tried to use the simple sample code provided just to test this out:

from pro_football_reference_web_scraper import team_game_log as t

game_log = t.get_team_game_log(team ='Kansas City Chiefs', season = 1995) print(game_log)

And received the error:

"/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pro_football_reference_web_scraper/team_game_log.py", line 172, in get_team_game_log return collect_data(soup, season, team) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pro_football_reference_web_scraper/team_game_log.py", line 206, in collect_data games = soup.find_all('tbody')[1].find_all('tr') IndexError: list index out of range

I have not gotten any functionality from this, but I have seen good reviews online. All I have done is pip install and then try this trial code.

mjk2244 commented 9 months ago

It's working as expected for me. Does the same problem happen when you use different team/year combos?

Also, out of curiosity, where did you see reviews online?

spooge11 commented 8 months ago

Piggybacking of this thread, I got everything to work, but when using team data in 2023 I get and error. Thank you for this stuff, great great work.

game_log = t.get_team_game_log(team = 'Detroit Lions', season = 2022)

print(game_log) week day rest_days home_team distance_travelled ... pass_yds rush_yds opp_tot_yds opp_pass_yds opp_rush_yds 0 1 Sun 10 days True 0.000000 ... 205 181 455 239 216 1 2 Sun 7 days True 0.000000 ... 234 191 396 308 88 2 3 Sun 7 days False 526.955998 ... 277 139 373 250 123 3 4 Sun 7 days True 0.000000 ... 375 145 555 320 235 4 5 Sun 7 days False 613.213924 ... 211 101 364 188 176 5 7 Sun 14 days False 985.778406 ... 195 117 330 191 139 6 8 Sun 7 days True 0.000000 ... 311 82 476 369 107 7 9 Sun 7 days True 0.000000 ... 137 117 389 283 106 8 10 Sun 7 days False 233.989875 ... 228 95 408 150 258 9 11 Sun 7 days False 486.735689 ... 165 160 413 324 89 10 12 Thu 4 days True 0.000000 ... 230 96 401 237 164 11 13 Sun 10 days True 0.000000 ... 337 100 266 171 95 12 14 Sun 7 days True 0.000000 ... 330 134 416 394 22 13 15 Sun 7 days False 486.735689 ... 252 107 337 287 50 14 16 Sat 6 days False 500.832396 ... 336 45 570 250 320 15 17 Sun 8 days True 0.000000 ... 239 265 230 30 200 16 18 Sun 7 days False 286.660912 ... 219 104 291 188 103

[17 rows x 15 columns] game_log = t.get_team_game_log(team = 'Detroit Lions', season = 2023)

Traceback (most recent call last): File "", line 1, in File "C:\Users\pro_football_reference_web_scraper\team_game_log.py", line 172, in get_team_game_log return collect_data(soup, season, team) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\pro_football_reference_web_scraper\team_game_log.py" line 273, in collect_data points_allowed = int(games[i].find('td', {'data-stat': 'pts_def'}).text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: invalid literal for int() with base 10: ''

julianchow513 commented 8 months ago

I am getting the same error as @mjk2244. I have tried with multiple teams/year combos but none seem to work for me.

Traceback (most recent call last): File "test.py", line 3, in <module> game_log = t.get_team_game_log(team="Kansas City Chiefs", season=1995) File "pro_football_reference_web_scraper\team_game_log.py", line 172, in get_team_game_log return collect_data(soup, season, team) File "pro_football_reference_web_scraper\team_game_log.py", line 206, in collect_data games = soup.find_all('tbody')[1].find_all('tr') IndexError: list index out of range

Traceback (most recent call last): File "test.py", line 3, in <module> game_log = t.get_team_game_log(team="Indianapolis Colts", season=1991) File "pro_football_reference_web_scraper\team_game_log.py", line 172, in get_team_game_log return collect_data(soup, season, team) File "pro_football_reference_web_scraper\team_game_log.py", line 206, in collect_data games = soup.find_all('tbody')[1].find_all('tr') IndexError: list index out of range

ryan-hayward commented 7 months ago

@mjk2244 @julianchow513 I did a little bit of traceback and found that I was experiencing the same error for the index being out of range...for me, this was ultimately due to make_request_list call (line 31 of player_game_log.py) returning a 429 error for too many requests. Through a bit of additional digging I found out that pro football reference caps bots at 20 requests per minute, and if you exceed that limit your session is "in jail" for the next hour. I believe the same would apply for team_player_log issues. Not sure if this is what others are experiencing but may help.