rudeboybert / fivethirtyeight

R package of data and code behind the stories and interactives at FiveThirtyEight
https://fivethirtyeight-r.netlify.app/
Other
454 stars 104 forks source link

adding Quasi-Win Shares data set #58

Closed ranawg closed 4 years ago

ranawg commented 4 years ago

added a processing_data and data R scripts for Quasi-Win Shares

beanumber commented 4 years ago

@ranawg see also #54 and #56 for additional information about adding yourself as a contributor, etc.

rudeboybert commented 4 years ago

After you're done your next round of edits, please resolve the above merge conflicts in DESCRIPTION and NEWS.md by clicking on "Resolve conflicts". @beanumber can help you with this.

Thanks for your work!

rudeboybert commented 4 years ago

Hey @ranawg, thanks for the update! A few more issues:

First, I forgot that variable names within a data frame that start with numbers are problematic. See the reprex below:

library(tidyverse)
library(fivethirtyeight)
glimpse(quasi_winshares)
#> Observations: 98,796
#> Variables: 24
#> $ name_common <chr> "Ketel Marte", "Zack Greinke", "Eduardo Escobar", "N…
#> $ age         <int> 25, 35, 30, 29, 28, 24, 31, 27, 30, 34, 25, 26, 33, …
#> $ player_id   <chr> "marteke01", "greinza01", "escobed01", "ahmedni01", …
#> $ year_id     <int> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019…
#> $ team_id     <fct> ARI, ARI, ARI, ARI, ARI, ARI, ARI, ARI, ARI, ARI, AR…
#> $ lg_id       <fct> NL, NL, NL, NL, NL, NL, NL, NL, NL, NL, NL, NL, NL, …
#> $ pct_pt      <dbl> 6.1913989, 4.1080205, 6.7632160, 6.0435152, 5.826619…
#> $ war162      <dbl> 7.15754717, 5.02301887, 4.02962264, 3.74943396, 2.19…
#> $ quasi_ws    <int> 30, 21, 21, 19, 15, 11, 11, 11, 11, 10, 8, 7, 7, 6, …
#> $ stint_id    <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1…
#> $ franch_id   <fct> ARI, ARI, ARI, ARI, ARI, ARI, ARI, ARI, ARI, ARI, AR…
#> $ prev_franch <fct> SEA, LAD, MIN, NA, BAL, STL, NA, DET, NA, SEA, STL, …
#> $ year_acq    <int> 2017, 2016, 2018, 2014, 2017, 2019, 2014, 2015, 2019…
#> $ year_left   <int> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019…
#> $ next_franch <fct> NA, HOU, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ P           <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE…
#> $ C           <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE…
#> $ `1B`        <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALS…
#> $ `2B`        <lgl> TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ `3B`        <lgl> FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE…
#> $ SS          <lgl> TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE…
#> $ LF          <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALS…
#> $ CF          <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
#> $ RF          <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…

Created on 2019-11-15 by the reprex package (v0.3.0)

See how there are tick marks by the variables 1B, 2B, 3B? Let's get around this by renaming them: FirstB, SecondB, ThirdB

Second, whee I run a package check, it's saying the variable names in the data frame don't match the roxygen2 documentation/help file you wrote (attached is a screenshot of the error). Could you make sure these match?

Screen Shot 2019-11-15 at 11 47 28 AM

After these changes, I think we'll be good to merge! @beanumber

beanumber commented 4 years ago

Yes, they have to be in the same order.

Also, I might suggest X1B, X2B, and X3B as replacements for the offending variable names. Even though it's ugly, it matches what is present in the Lahman package, and so will make joins easier.

beanumber commented 4 years ago

From Travis:

* checking for code/documentation mismatches ... WARNING

Data codoc mismatches from documentation object 'quasi_winshares':

Variables in data frame 'quasi_winshares'

  Code: 1B 2B 3B age C CF franch_id LF lg_id name_common next_franch P

        pct_pt player_id prev_franch quasi_ws RF SS stint_id team_id

        war162 year_acq year_id year_left

  Docs: age C CF franch_id LF lg_id name_common next_franch P pct_pt

        player_id prev_franch quasi_ws RF SS stint_id team_id WAR162

        X1B X2B X3B year_acq year_id year_left

@ranawg can you please make sure these match exactly? They have to be in the same order and spelled the same way.

ranawg commented 4 years ago

@beanumber In regards to the same order, does that mean they are in the same order alphabetically or that the same order they appear in the data frame?

beanumber commented 4 years ago

The same order that they appear in the data frame.

rudeboybert commented 4 years ago

Great! Thanks @ranawg. You're now listed as a contributor to the package here: https://github.com/rudeboybert/fivethirtyeight/graphs/contributors