k5cents / whatr

Read Jeopardy game data in R
https://kiernann.github.io/whatr/
GNU General Public License v3.0
9 stars 4 forks source link

score values computed by whatr_scores() are often wrong #10

Closed spkaluzny closed 2 years ago

spkaluzny commented 2 years ago

Many values computed by whatr_scores do not match what is shown for the game on the J! Archive site.

I had previously scraped the J! Archive site for FirstRoundScore, SecondRoundScore and FinalScore. Comparing those values to the values I get using the whatr package, I found over 600 differences in the score values across over 4500 games. I examined some of these games in detail, comparing the values from whatr with what is shown on the J! Archive site.

Here are 3 examples where whatr_scores is returning incorrect values.

## game 6028
game <- 6028
s <- whatr::whatr_scores(game)
# Compute first round score for Linda:
sum(s[s$name == "Linda" & s$round == 1, "score"])
#> [1] 500  # J! Archive has -500 for Linda score at end of first round

s[s$name == "Linda" & s$round == 1, ]
#> # A tibble: 5 × 5
#>   round     i name  score double
#>   <int> <int> <chr> <int> <lgl> 
#> 1     1     4 Linda   400 FALSE 
#> 2     1     7 Linda  -200 FALSE 
#> 3     1    21 Linda   300 FALSE 
#> 4     1    22 Linda  -500 FALSE 
#> 5     1    23 Linda   500 TRUE 

# The i=23 score should be -500, not 500.

## game 23
game <- 23
s <- whatr::whatr_scores(game)

# Compute first round score for Kathryn:
sum(s[s$name == "Kathryn" & s$round == 1, "score"])
#> [1] 200  # J! Archive has -200 for Kathryn score at end of first round

s[s$name == "Kathryn" & s$round == 1, ]
#> # A tibble: 2 × 5
#>   round     i name    score double
#>   <int> <int> <chr>   <int> <lgl> 
#> 1     1    14 Kathryn   800 FALSE 
#> 2     1    15 Kathryn  -600 FALSE

# The i=15 score should be -1000 not -600

## game 103
game <- 103
s <- whatr::whatr_scores(game)

# Compute first round score for Rick:
sum(s[s$name == "Rick" & s$round == 1, "score"])
#> [1] 1200  # J! Archive has -1200 for Rick score at end of first round

s[s$name == "Rick" & s$round == 1, ]
#> # A tibble: 4 × 5
#>   round     i name  score double
#>   <int> <int> <chr> <int> <lgl> 
#> 1     1     9 Rick    800 FALSE 
#> 2     1    18 Rick   1000 FALSE 
#> 3     1    19 Rick   -200 FALSE 
#> 4     1    25 Rick   -400 FALSE

# The i=9 score should be -800 not 800
# The i=18 score should be -1000 not 1000
# The i=19 score should be 200 not -200
# The i=25 score should be 400 not -400
k5cents commented 2 years ago

I think I may have fixed it with a regex change?

s <- whatr::whatr_scores(6028)
sum(s[s$name == "Linda" & s$round == 1, "score"])
#> [1] -500

s <- whatr::whatr_scores(23)
sum(s[s$name == "Kathryn" & s$round == 1, "score"])
#> [1] -200

s <- whatr::whatr_scores(103)
sum(s[s$name == "Rick" & s$round == 1, "score"])
#> [1] -1200

s[s$name == "Rick" & s$round == 1, ]
#> # A tibble: 4 × 5
#>   round     i name  score double
#>   <int> <int> <chr> <int> <lgl> 
#> 1     1     9 Rick   -800 FALSE 
#> 2     1    18 Rick  -1000 FALSE 
#> 3     1    19 Rick    200 FALSE 
#> 4     1    25 Rick    400 FALSE

Created on 2022-03-21 by the reprex package (v2.0.1)

spkaluzny commented 2 years ago

I scraped all episodes from the J! Archive site with whatr using our regex fix for the scores. As far as I can tell, your change fixed this bug. I did learn that the J! Archive has updated some games that must have had incorrect values when I scraped the site for scores a few years ago - all my comparison of the whatr data to my old data showed that your whatr results matched what is on the J! Archive site now.

Thanks for the quick fix.