Closed cawthm closed 3 years ago
Howdy!
It's definitely possible, I'll be adding get_espn_win_prob()
and get_nfl_schedule()
functions shortly.
Here's the plotted output from get_espn_win_prob()
As far as I can tell you'd have to get this at the game level, but by combining get_nfl_schedule()
to get the game_id
which can be passed to get_espn_win_prob()
- returns a dataframe like below:
Rows: 185
Columns: 19
$ row_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, …
$ quarter <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ home_score <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ away_score <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7,…
$ distance <int> 0, 10, 10, 9, 10, 10, 4, 10, 6, 1, 10, 7, 7, 10, 5, 1, 0, 10, 5, 7, 10,…
$ yard_line <int> 35, 80, 69, 68, 55, 55, 49, 44, 40, 35, 33, 30, 30, 16, 5, 1, 65, 16, 2…
$ pos_team_id <chr> "12", "17", "17", "17", "17", "17", "17", "17", "17", "17", "17", "17",…
$ down <int> 0, 1, 1, 2, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 1, 2, 0, 1, 2, 3, 1, 2, 3, 4,…
$ yards_to_endzone <int> 65, 80, 69, 68, 55, 55, 49, 44, 40, 35, 33, 30, 30, 16, 5, 1, 65, 84, 7…
$ short_down_distance_text <chr> NA, "1st & 10", "1st & 10", "2nd & 9", "1st & 10", "2nd & 10", "3rd & 4…
$ possession_text <chr> NA, "NE 20", "NE 31", "NE 32", "NE 45", "NE 45", "KC 49", "KC 44", "KC …
$ down_distance_text <chr> NA, "1st & 10 at NE 20", "1st & 10 at NE 31", "2nd & 9 at NE 32", "1st …
$ text <chr> "H.Butker kicks 70 yards from KC 35 to NE -5. C.Patterson to NE 20 for …
$ play_type <chr> "Kickoff", "Rush", "Rush", "Pass Reception", "Rush", "Pass Reception", …
$ overtime_play_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ home_win_percentage <dbl> 0.668, 0.654, 0.676, 0.639, 0.659, 0.649, 0.610, 0.623, 0.593, 0.582, 0…
$ play_id <chr> "40103885036", "40103885061", "40103885083", "401038850105", "401038850…
$ tie_percentage <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ seconds_left <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
Super interesting. Thank you for this package and for your work on modeling generally.
Hi @cawthm - I've added get_espn_win_prob()
officially to the package, you just need to pass specific a game_id
. There's also only win prob for past few years, so you will get errors prior to 2016.
espnscrapeR::get_espn_win_prob(game_id = "401030956") %>% glimpse()
Rows: 160
Columns: 9
$ row_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,…
$ home_team_id <chr> "23", "23", "23", "23", "23", …
$ away_team_id <chr> "17", "17", "17", "17", "17", …
$ tie_percentage <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ home_win_percentage <dbl> 0.599, 0.605, 0.597, 0.592, 0.…
$ away_win_percentage <dbl> 0.401, 0.395, 0.403, 0.408, 0.…
$ sequence_number <chr> "100", "3600", "5100", "7700",…
$ play_id <chr> "4010309561", "40103095636", "…
$ game_id <chr> "401030956", "401030956", "401…
Espn posts win probabilities that are updated live with each play/ clock tick during games. Have you looked at scraping this and/or is there a repo of anything interesting, eg time stamped probability data anywhere?