jflancer / bigballR

Package for working with NCAA Basketball Data
Other
61 stars 19 forks source link

bigballR

NOTE THIS DOCUMENTATION IS SLIGHTLY OUTDATED FOLLOWING THE LATEST PACKAGE UPDATE 1/13

bigballR is an R package for working with NCAA Basketball data. This package primarily revolves around the use of schedule, roster, and play-by-play data via stats.ncaa.com, and additionally has features to calculate lineups, on/off results, and player game and multi-game statistics.

Installation

First install the package devtools if you haven't already

#install.packages("devtools")
devtools::install_github("jflancer/bigballR")

Functionality

Retrieving Game IDs and Other Information

Manually, game ids can be found in the url when browsing games, for example: 4674164 is the game id for https://stats.ncaa.org/game/play_by_play/4674164

Game Scraping Functions

Data Manipulation Functions

Datasets

Use

There are many different progressions and ways to use this package. As an example, here are some natural steps you could take.

# Get team schedule
# Note: if you don't know the proper team.name (case sensitive), you can look it up in data("teamids")
schedule <- get_team_schedule(season = "2018-19", team.name = "Duke")
# Get play by play for all games played so far in season
play_by_play <- get_play_by_play(schedule$Game_ID)
# Generate all lineups and stats from the play by play
lineups <- get_lineups(play_by_play_data = play_by_play, keep.dirty = T, garbage.filter = F)
# Look at Zion Williamson's on/off statistics with lineups that include Reddish and Barrett
zion_comparison <- on_off_generator("ZION.WILLIAMSON", lineups, Included = c("CAM.REDDISH","RJ.BARRETT"))

scrape_game / get_play_by_play

Functions to retrieve play by play data. scrape_game() works for individual games while get_play_by_play can handle a vector of gameids and will aggregate into a single dataframe. Warns users of potential errors and mistakes made by the game trackers. The number of player discrepancies warning counts displays the number of events players committed when it is found they were not on the court at the time of the event. The substitution mistake warning indicates an unclean substitution was entered. (ex. 2 players enter and 1 leaves)

get_date_games

This function returns a schedule for the given date and specified conference. Results are included if applicable, as well as the play-by-play game id

get_team_schedule

This function returns a data frame of the schedule for the specified team. This will include game ids used for play-by-play scraping if the game has ended, along with the team scores and attendance. Note: currently, the season/team.name parameters can only be used for the 2016-17, 2017-18, 2018-19 seasons.

get_team_roster

This function returns a data frame of the roster for the specified team. This will include player names and positions as well as jersey number, height and school year. Note: currently, the season/team.name parameters can only be used for the 2016-17, 2017-18, 2018-19 seasons.

get_lineups

This function takes in a play-by-play dataframe, and generates all possible lineups for both teams. It then calculates a variety of statistics/metrics at a lineup level.

on_off_generator

This function passes in lineup data and calculates the on/off lineup statistics for all lineup combinations of players specified. This allows users to view on/off statistics for individual players, as well as combinations of multiple players. Users can also specify if they'd like specific players to be included or excluded from all lineups in use

get_player_lineups

This function finds all lineups from a given lineup data source that include/exclude certain players. It acts as a quick way to filter lineups for players.

get_player_stats

This function calculates many player stats for either individual games or aggregate to get multi-game stats.