Open harlanv24 opened 1 year ago
This is a consequence of the functions having if r.status_code == 200:
as a condition. The results dataframe is initialized as None
, which means that if you get a status code other than 200, you'll just get no results. I was able to determine that when I was getting no results, it was because the status code is 429 AKA too many requests. So, even though this library might have eventually worked with hours of requests, the www.basketball-reference.com has probably updated their site to include some sort of rate limiting. You can update the functions to at least return the status code by doing something like the following:
def get_roster(team, season_end_year):
r = get(
f'https://www.basketball-reference.com/teams/{team}/{season_end_year}.html')
df = None
try:
#if r.status_code == 200:
soup = BeautifulSoup(r.content, 'html.parser')
table = soup.find('table')
df = pd.read_html(str(table))[0]
df.columns = ['NUMBER', 'PLAYER', 'POS', 'HEIGHT', 'WEIGHT', 'BIRTH_DATE',
'NATIONALITY', 'EXPERIENCE', 'COLLEGE']
# remove rows with no player name (this was the issue above)
df = df[df['PLAYER'].notna()]
df['PLAYER'] = df['PLAYER'].apply(
lambda name: remove_accents(name, team, season_end_year))
# handle rows with empty fields but with a player name.
df['BIRTH_DATE'] = df['BIRTH_DATE'].apply(
lambda x: pd.to_datetime(x) if pd.notna(x) else pd.NaT)
df['NATIONALITY'] = df['NATIONALITY'].apply(
lambda x: x.upper() if pd.notna(x) else '')
except Exception as e:
print(e)
print(r.status_code)
return df
I'm currently trying to figure out how to automatically rate limit, but I haven't figured that out. Will follow up if I do!
I was gathering data using these methods and all of a sudden they stopped working, now returning 'None' instead of a dataframe. For instance, the following line:
print(get_team_stats('MIA', 2013))
Prints out 'None' in the console. What's going on?