jthomasmock / espnscrapeR

Scrapes Or Collects NFL Data From ESPN
https://jthomasmock.github.io/espnscrapeR/
Other
51 stars 10 forks source link

ESPN Data inconsistent #3

Open mrcaseb opened 4 years ago

mrcaseb commented 4 years ago

Note: This Issue isn't a code problem! It is just for information to the users and to make the developer aware of it.

ESPN is writing on it's Total QBR website

To qualify, a player must play a minimum of 20 action plays

which always was my explanation when a player was missing in the data. But it gets very confusing now. I am doing this example for the 2018 playoffs and didn't check it for other years.

2018 Wildcard weekend had the following games (winners bold):

  1. IND @ HOU
  2. SEA @ DAL
  3. LAC @ BAL
  4. PHI @ CHI

Running

qbr_week <- get_nfl_qbr("2018", season_type = "Playoffs", week = 1) %>%
  select(short_name, team_short_name, qbr_total, qb_plays)

leads to 3 entries

image

But running

qbr_all <- get_nfl_qbr("2018", season_type = "Playoffs", week = NA)%>%
  select(short_name, team_short_name, qbr_total, qb_plays)

leads to this

Bildschirmfoto 2020-03-20 um 10 50 35

In the total data there are not only more qbs from the wildcard weekend (Watson, Wilson, Trubisky), there is also another total qbr given for Lamar Jackson... It is unclear which dataset to trust and the problem is that we can only combine qbs that lost because the overall dataset mixes the games of qbs who played more than one game.

jthomasmock commented 4 years ago

I appreciate you finding some of these edge-cases!

I wonder if I should go back to just straight rvest scraping the site - I'll dig into the API to see if there's a reason for duplicates.

jthomasmock commented 4 years ago

AH I think I know why this is occuring. There is a "best" games option - where week is missing. This is weekly best games.

See: https://www.espn.com/nfl/qbr/_/view/weekly/season/2018/seasontype/3/week/

mrcaseb commented 4 years ago

Yeah the thing is when you choose for example Wild Card instead of Best there are less entries for the Wild Card Weekend. That’s what makes no sense to me...