thecodeforest / fantasyfootball

MIT License
7 stars 1 forks source link

logo


Welcome to fantasyfootball

fantasyfootballcodecovCode style: blackDocumentation StatusPythonPyPI versionLicense

fantasyfootball is a Python package that provides up-to-date game data, including player statistics, betting lines, injuries, defensive rankings, and game-day weather data. While many websites offer NFL game data, obtaining it in a format appropriate for analysis or inference requires either (1) a paid subscription or (2) manual weekly downloads with extensive data cleaning. fantasy football centralizes game data in a single location while ensuring it is up-to-date throughout the season.

Additionally, fantasyfootball streamlines the creation of features for in-season, player-level fantasy point projections. The resulting projections can then determine weekly roster decisions. Check out the tutorial notebook to get started!

Installation

$ pip install fantasyfootball

Benchmarking

The fantasyfootball package provides football enthusiasts with the data and tools to create player point projections customized for their league's scoring system. Indeed, a simple comparison between (1) a "naive" projection, and (2) a subscription-based, "industry-grade" projection, revealed that accurate weekly player-level point projections are achievable with fantasyfootball. Across all player positions, fantasyfootball projections were, on average, 18% more accurate relative to the naive projection (5.6 pts vs. 4.6 pts), while the industry-grade projections were 4% more accurate than the fantasyfootball projections (4.6 pts vs. 4.4 pts). The figure below further disaggregates projection performance by player position. More details surrounding this analysis can be found in the benchmarking notebook.

benchmark

Quickstart

Let's walk through an example to illustrate a core use-case of fantasyfootball: weekly roster decisions. Imagine it's Tuesday, Week 15 of the 2021 NFL regular season. Your somewhat mediocre team occupies 5th place in the league standings, one spot away from the coveted playoff threshold. It is a must-win week, and you are in the unenviable position of deciding who starts in the Flex roster spot. You have three wide receivers available to start, Keenan Allen, Chris Godwin, or Tyler Lockett, and you want to estimate which player will score more points in Week 15. Accordingly, you use the data and feature engineering capabilities in fantasyfootball to create player-level point projections. The player with the highest point projection will be slotted into the Flex roster spot, propelling your team to fantasy victory!

Let's start by importing several packages and reading all game data from the 2015-2021 seasons.

from janitor import get_features_targets 
from xgboost import XGBRegressor

from fantasyfootball.data import FantasyData
from fantasyfootball.features import FantasyFeatures
from fantasyfootball.benchmarking import filter_to_prior_week

# Instantiate FantasyData object for 2015-2021 seasons
fantasy_data = FantasyData(season_year_start=2015, season_year_end=2021)

At the time of writing this walkthrough, there are 45 fields available for each player-season-week. For more details on the data, see the Datasets section below.

Next, we'll create our outcome variable (y) that defines each player's total weekly fantasy points. Then, depending on your league's scoring rules, you can supply standard fantasy football scoring systems, including yahoo, fanduel, draftkings, or create your own custom configuration. In the current example, we'll assume you are part of a yahoo league with standard scoring.

fantasy_data.create_fantasy_points_column(scoring_source="yahoo")

Now that we've added our outcome variable, we'll extract the data and look at a few fields for Tyler Lockett over the past four weeks. Note that a subset of all fields appears below.

# extract data from fantasy_data object
fantasy_df = fantasy_data.data
# filter to player-season-week in question
lockty_df = fantasy_df.query("name=='Tyler Lockett' & season_year==2021 & 11<=week<=14")   
print(lockty_df)
pid week is_away receiving_rec receiving_td receiving_yds fanduel_salary ff_pts_yahoo
LockTy00 11 0 4 0 115 6800 13.5
LockTy00 12 1 3 0 96 6800 11.1
LockTy00 13 0 7 1 68 6900 16.3
LockTy00 14 1 5 1 142 7300 24.7

We'll create the feature set that will feed our predictive model in the following section. The first step is to filter to the most recently completed week for all wide receivers (WR).

# extract the name of our outcome variable
y = fantasy_df.columns[-1]
# filter to all data prior to 2021, Week 15
backtest_df = fantasy_df.filter_to_prior_week(season_year=2021, week_number=14)
# Instantiate FantasyFeatures object for all Wide Receivers
features = FantasyFeatures(backtest_df, position="WR", y=y)   

Now, we'll apply a few filters and transformations to prepare our data for modeling:

features.filter_inactive_games(status_column="is_active")
features.filter_n_games_played_by_season(min_games_played=1)
features.create_future_week()
features.add_coefficient_of_variation(n_week_window=16)
features.add_lag_feature(n_week_lag=1, lag_columns=y)
features.add_moving_avg_feature(n_week_window=4, window_columns=[y, "off_snaps_pct"])
features_signature_dict = features.create_ff_signature()

Having created our feature set, we'll seperate our historical (training) data , denoted hist_df, from the future (testing), unplayed game data, denoted future_df, using the indicator added above during the create_future_week step.

feature_df = features_signature_dict.get("feature_df")
hist_df = feature_df[feature_df["is_future_week"] == 0]
future_df = feature_df[feature_df["is_future_week"] == 1]

For the sake of simplicity, we'll leverage a small subset of raw, untransformed features from our original data, and combine these with the derived features we created in the previous step.

derived_feature_names = features_signature_dict.get("pipeline_feature_names")
# to do: add another feature in
raw_feature_names = ["fanduel_salary"]
all_features = derived_feature_names + raw_feature_names

Let's split between our train/hist and test/future data.

X_hist, y_hist = hist_df[all_features + [y]].get_features_targets(y, all_features)
X_future = future_df[all_features]

Now we can fit a simple model and make predictions for the upcoming week.

xgb = XGBRegressor(
    objective="reg:squarederror",
    learning_rate=0.01,
    max_depth=3,
    n_estimators=500,
)
xgb.fit(X_hist.values, y_hist.values)
y_future = xgb.predict(X_future.values).round(1)

Below we'll assign our point predictions back to the future_df we created for Week 15 and filter to the three players in question.

future_df = future_df.assign(**{f"{y}_pred": y_future})
players = ["Chris Godwin", "Tyler Lockett", "Keenan Allen"]
future_df[["name", "team", "opp", "week", "date", f"{y}_pred", "cv"]].query(
    "name in @players"
)
name team opp week date ff_pts_yahoo_pred cv
Keenan Allen LAC KAN 15 2021-12-16 12.0 35
Chris Godwin TAM NOR 15 2021-12-19 13.4 49
Tyler Lockett SEA LAR 15 2021-12-21 11.3 74

Keenan Allen and Chris Godwin are projected to score ~1-2 more points than Tyler Lockett. And while Chris Godwin and Keenan Allen have similar projections over the past 16 games, Allen is more consistent than Godwin. That is, we should put more faith in Allen's 12-point forecast relative to Godwin. When point projections are equivalent, CV can be a second input when deciding between two players. For example, if the goal is to score many points and win the week, a player with a large CV might be the better option, as they have a higher potential ceiling. In contrast, if the goal is to win, and the total points scored are less critical, then a more consistent player with a small CV is the better option.

Datasets

The package provides the following seven datasets by season:








Data Pipeline

While the PyPi version of fantasyfootball is updated monthly, the GitHub version is updated every Thursday during the regular season (Sep 8 - Jan 8). New data is stored in datasets directory within the fantasyfootball package. If there is a difference between the data in Github and the installed version, creating a FantasyData object will download the new data. Note that differences in data persist when a session ends. Updating the installed version of fantasyfooball will correct this difference and is recommended at the end of each season.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

fantasyfootball was created by Mark LeBoeuf. It is licensed under the terms of the MIT license.