nflverse / nfl_data_py

Python code for working with NFL play by play data.
MIT License
272 stars 52 forks source link

import_pfr_passing #50

Closed bhibby closed 1 year ago

bhibby commented 1 year ago

Hi, does this module work? I received an error: module 'nfl_data_py' has no attribute 'import_pfr_passing'.

looking for RPO info and I believe this has it?

thank you! b

alecglen commented 1 year ago

Hey @bhibby, thanks for the callout! This looks like a documentation mistake - the import_pfr_passing hook was replaced with a more general import_pfr(stat_type) a while back. Give that a try and reply back if you have any issues. I'll get the documentation updated shortly.

Here is the docstring for the updated method:

def import_pfr(s_type, years=None):
    """Import PFR advanced statistics

    Args:
        s_type (str): must be one of pass, rec, rush
        years (List[int]): years to return data for, optional
    Returns:
        DataFrame
    """
bhibby commented 1 year ago

thanks. I'm still not sure if RPO data is included? Doesnt show up in the column list

bhibby commented 1 year ago

I also pulled the pfr pass data for 2022 and the only game that shows up is the super bowl.

alecglen commented 1 year ago

thanks. I'm still not sure if RPO data is included? Doesnt show up in the column list

I think the reason you're not seeing RPO data is because it is only present at the seasonal level on PFR, but not at the weekly level (which is what gets returned from this method when you specify years). If you just call nfl.import_pfr("pass") without a years arguments, then you'll see the RPO columns returned.

I can definitely see how that is confusing if you're not actively looking at the PFR pages while using the method. We will discuss breaking these into separate _weekly and _seasonal methods to help clarify that the columns will differ.

alecglen commented 1 year ago

I also pulled the pfr pass data for 2022 and the only game that shows up is the super bowl.

This one is a known issue in our data source https://github.com/nflverse/nflverse-pfr/issues/30. Hopefully that will get fixed soon.

In the meantime, something like this can get the missing data per player. Just please make sure to respect PFR's server.

# stat_type options: "passing", "rushing_and_receiving", "defense"

def scrape_pfr_advanced_2022(name, stat_type):
    pfr_id = nfl.import_pfr("pass").loc[lambda x: x.player == name, "pfr_id"].iloc[0]
    url = f"https://www.pro-football-reference.com/players/{pfr_id[0]}/{pfr_id}/gamelog/2022/advanced/"
    table = pd.read_html(url, attrs={"id": f"advanced_{stat_type}"}, header=1)[0]
    return table.iloc[:-1].rename(columns={"Rk": "Week", "Unnamed: 6": "At"})
alecglen commented 1 year ago

Original documentation confusion resolved via https://github.com/cooperdff/nfl_data_py/pull/51.

Missing 2022 data fixed with the resolution of https://github.com/nflverse/nflverse-pfr/issues/30.

Confusion around seasonal vs weekly PFR data to be addressed via https://github.com/cooperdff/nfl_data_py/issues/53.