vishaalagartha / basketball_reference_scraper

A python module for scraping static and dynamic content from Basketball Reference.
MIT License
254 stars 91 forks source link

get game logs not working as expected #15

Closed diego-escobedo closed 4 years ago

diego-escobedo commented 4 years ago

A normal call like get_game_logs('Thabo Sefolosha', '2013-08-01', '2014-02-02') will return the correct game logs, whereas for some players eg. get_game_logs('DeMarcus Cousins', '2013-08-01', '2014-02-02') does not return anything. Also happens for random other players (Patty Mills, Gerald Henderson, etc.). May be due to the bball-ref widget function

undraliu commented 4 years ago

DeMarcus Cousins is coded to Marcus Cousin currently

lkuna24 commented 4 years ago

DeMarcus Cousins is coded to Marcus Cousin currently

Is there a way to see what the basketball reference has the names stored at?
Here are a few more that seem not to work, and it seems impossible to figure out what BBR has these players names stored as: Raul Neto Vincent Poirier Glenn Robinson III

vishaalagartha commented 4 years ago

This was an issue in the get_player_suffix() function in utils, I think. The function was having trouble finding players with extra suffixes (e.g. 'III', 'Jr) and players with similar names ('Marcus Cousin' and 'DeMarcus Cousins').

I fixed it by manually going to source page and ensuring the names match completely. To see the fixed version, please check out v1.0.14!

ppolonik2 commented 4 years ago

I think this might still be an issue. pip install basketball-reference-scraper currently installs v1.0.18 by default and loading LaMarcus Aldridge gave an empty dataframe when playoffs=False. However, after I found this I uninstalled and re-installed with pip install basketball-reference-scraper==1.0.14 and the problem went away.

ppolonik2 commented 4 years ago

Sorry, my above comment seems to related but different. This line in get_game_logs: if not isinstance(row['GS'], int) looks like it was used to fix a previous issue. However, when there is text in the dataframe 'GS' variable (like 'Inactive'), the 0s and 1s load as strings rather than integers, so this condition fails when it's not supposed to. It might not be the most elegant solution, but I added a couple lines that fixed the problem for me. The whole loop now looks like this:

for index, row in df.iterrows():
    if isinstance(row['GS'], str):
        if not row['GS'].isnumeric():
            continue
        else: 
            row['GS'] = int(row['GS'])
    if not isinstance(row['GS'], int):
        continue
    active_df = active_df.append(row)