jthomasmock / espnscrapeR

Scrapes Or Collects NFL Data From ESPN
https://jthomasmock.github.io/espnscrapeR/
Other
51 stars 11 forks source link

Intragame win probability #4

Closed cawthm closed 3 years ago

cawthm commented 3 years ago

Espn posts win probabilities that are updated live with each play/ clock tick during games. Have you looked at scraping this and/or is there a repo of anything interesting, eg time stamped probability data anywhere?

jthomasmock commented 3 years ago

Howdy!

It's definitely possible, I'll be adding get_espn_win_prob() and get_nfl_schedule() functions shortly.

Here's the plotted output from get_espn_win_prob()

Screen Shot 2020-10-29 at 10 45 00 AM

Game Page

As far as I can tell you'd have to get this at the game level, but by combining get_nfl_schedule() to get the game_id which can be passed to get_espn_win_prob() - returns a dataframe like below:

Rows: 185
Columns: 19
$ row_id                   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, …
$ quarter                  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ home_score               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ away_score               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7,…
$ distance                 <int> 0, 10, 10, 9, 10, 10, 4, 10, 6, 1, 10, 7, 7, 10, 5, 1, 0, 10, 5, 7, 10,…
$ yard_line                <int> 35, 80, 69, 68, 55, 55, 49, 44, 40, 35, 33, 30, 30, 16, 5, 1, 65, 16, 2…
$ pos_team_id              <chr> "12", "17", "17", "17", "17", "17", "17", "17", "17", "17", "17", "17",…
$ down                     <int> 0, 1, 1, 2, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 1, 2, 0, 1, 2, 3, 1, 2, 3, 4,…
$ yards_to_endzone         <int> 65, 80, 69, 68, 55, 55, 49, 44, 40, 35, 33, 30, 30, 16, 5, 1, 65, 84, 7…
$ short_down_distance_text <chr> NA, "1st & 10", "1st & 10", "2nd & 9", "1st & 10", "2nd & 10", "3rd & 4…
$ possession_text          <chr> NA, "NE 20", "NE 31", "NE 32", "NE 45", "NE 45", "KC 49", "KC 44", "KC …
$ down_distance_text       <chr> NA, "1st & 10 at NE 20", "1st & 10 at NE 31", "2nd & 9 at NE 32", "1st …
$ text                     <chr> "H.Butker kicks 70 yards from KC 35 to NE -5. C.Patterson to NE 20 for …
$ play_type                <chr> "Kickoff", "Rush", "Rush", "Pass Reception", "Rush", "Pass Reception", …
$ overtime_play_count      <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ home_win_percentage      <dbl> 0.668, 0.654, 0.676, 0.639, 0.659, 0.649, 0.610, 0.623, 0.593, 0.582, 0…
$ play_id                  <chr> "40103885036", "40103885061", "40103885083", "401038850105", "401038850…
$ tie_percentage           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ seconds_left             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
cawthm commented 3 years ago

Super interesting. Thank you for this package and for your work on modeling generally.

jthomasmock commented 3 years ago

Hi @cawthm - I've added get_espn_win_prob() officially to the package, you just need to pass specific a game_id. There's also only win prob for past few years, so you will get errors prior to 2016.

espnscrapeR::get_espn_win_prob(game_id = "401030956") %>% glimpse()
Rows: 160
Columns: 9
$ row_id              <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,…
$ home_team_id        <chr> "23", "23", "23", "23", "23", …
$ away_team_id        <chr> "17", "17", "17", "17", "17", …
$ tie_percentage      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ home_win_percentage <dbl> 0.599, 0.605, 0.597, 0.592, 0.…
$ away_win_percentage <dbl> 0.401, 0.395, 0.403, 0.408, 0.…
$ sequence_number     <chr> "100", "3600", "5100", "7700",…
$ play_id             <chr> "4010309561", "40103095636", "…
$ game_id             <chr> "401030956", "401030956", "401…