jozefhajnala / nhlapi

A Minimum-Dependency R Interface to the NHL API
GNU Affero General Public License v3.0
30 stars 7 forks source link

Missing Shifts Function. #12

Open hswerdfe opened 3 years ago

hswerdfe commented 3 years ago

Shifts function seems to be missing from the API

I learned from this page that shifts can be gotten from this url.

https://api.nhle.com/stats/rest/en/shiftcharts?cayenneExp=gameId={game_id}

Note that I checked the 2018-2019 and the 2019-2020 seasons and 235 out of 2,782 games were missing shift data. so 92% had data ( I did not check other seasons.).

Also for the calculation of advanced stats (players on ice)

I was using the sqldf package and unequal joins that seemed to work based on event type. below is an example function from my code that takes two dataframes shifts and plays, and returns which players are on the ices for each play, feel free to adapt or include this code in you repo.

`nhl_players_on_ice_at_plays <- function(plays, shifts){

shifts <-
    shifts %>%
    rename(game_id := gameId) %>%
    rename(shift_id := id) %>%
    rename(team_id := teamId) %>%
    rename(team_name := teamName) %>%

s <- shifts %>% select(shift_id, game_id, period, playerId, startTime, endTime, team_id, team_name) %>%
    rename(shift_team_id := team_id, shift_team_name := team_name)

pp <- plays %>% select(play_id, game_id, period, periodTime, event, team_id, team_name, is_home) %>%
    rename(play_team_id := team_id, play_team_name := team_name, play_is_home := is_home)

p_need_start <-
    pp %>%
    filter(event %in% c("Game Scheduled", "Period Ready", "Period Start", "Faceoff", "Early Intermission End" ))

p_need_end <-
    pp %>%
    filter(event %in% c("Stoppage", "Penalty" , "Period End", "Period Official","Game End","Game Official", "Official Challenge", "Early Intermission Start", "Goal" ) )

p_rest <- pp %>% anti_join(p_need_start, by = c("play_id")) %>% anti_join(p_need_end, by = c("play_id"))

sp_rest <-
sqldf ("
    SELECT s.shift_id  , s.playerId , s.startTime, s.endTime , p.*
    FROM s AS s
    INNER JOIN p_rest as p ON
s.game_id = p.game_id AND
s.period = p.period AND
s.startTime <= p.periodTime AND
s.endTime >= p.periodTime") %>% tibble()

sp_need_start <-
    sqldf ("
    SELECT s.shift_id  , s.playerId , s.startTime, s.endTime , p.*
    FROM s AS s
    INNER JOIN p_need_start as p ON
s.game_id = p.game_id AND
s.period = p.period AND
s.startTime <= p.periodTime AND
s.endTime > p.periodTime") %>% tibble()

sp_need_end <-
    sqldf ("
    SELECT s.shift_id  , s.playerId , s.startTime, s.endTime , p.*
    FROM s AS s
    INNER JOIN p_need_end as p ON
s.game_id = p.game_id AND
s.period = p.period AND
s.startTime < p.periodTime AND
s.endTime >= p.periodTime") %>% tibble()

shifts_plays <-
    rbind(sp_rest,
          sp_need_start,
          sp_need_end)
shifts_plays

}`

jozefhajnala commented 3 years ago

Hi Howard, this is really interesting. I did not know much about this alternative API, as you may have noticed, all the functions in the package currently use the API exposed via https://statsapi.web.nhl.com/api/v1/.

The shifts data you mention however seem to come from the API used directly by the NHL website, for example: image

I unfortunately was not able to find any documentation whatsoever on this API, but if you could provide some more structured info I would happily integrate this into the package!

hswerdfe commented 3 years ago

I can't provide any further details about the API. I only ever found reference at that kaggle dataset I saw that "Martin Ellis" did. Maybe he can provide details, but I have no idea where he found out about it.

basically the procedure that seems to work is

  1. Take the url https://api.nhle.com/stats/rest/en/shiftcharts?cayenneExp=gameId=
  2. append a gameID to it and call it full_url or something
  3. call raw_data<-jsonlite::fromJSON(full_url)
  4. pass the result from that to df<-tibble(raw_data[["data"]]) There you have it a data frame with more then a dozen columns describing the shift data.

I am sure there is more things that could be done, but the above serviced my purpose of exploring the data calculating some historical stats.