sfirke / janitor

simple tools for data cleaning in R
http://sfirke.github.io/janitor/
Other
1.39k stars 133 forks source link

tabyl: percent vs share #300

Open cstepper opened 5 years ago

cstepper commented 5 years ago

Hi,

I'm excited about discovering the janitor package - especially for the tabyl function.

I just have one remark - not sure if they can/want to implement it:

Feature requests

When calling tabyl on one variable, it returns a data.frame with columns

In my opinion, the percent and valid_percent columns do not show percent values, as they do not sum up to 100. They rather show shares (which I do prefer over percent).

WRT consistent naming, IMO these variables should be named something like share and valid_share.

Not a big deal to rename these afterwards, but annoying to do it again and again. It'll be fantastic if you would consider changing the names.

library(tidyverse)
#> Registered S3 method overwritten by 'rvest':
#>   method            from
#>   read_xml.response xml2
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test

tab_hc = dplyr::starwars %>% 
  tabyl(hair_color)

tab_hc %>% select(percent, valid_percent) %>% colSums(na.rm = TRUE)
#>       percent valid_percent 
#>             1             1

Created on 2019-05-17 by the reprex package (v0.2.1.9000)

sfirke commented 5 years ago

It's issue 300!

Someone else had previously lamented that percent was not technically a correct name. I think it was a Twitter conversation I was only an observer in. Maybe she suggested proportion or prop as a better name.

I don't disagree, but at this point I believe the cost of making this change - moderate cost to me of updating code, potentially large annoyance to current users as their existing code breaks - outweighs the benefit of a potentially more-clear name.

If anyone ever experiences the problem that a reader incorrectly interprets percent = 0.37 as 0.37% instead of the correct 37%, as a result of this naming in tabyl, I deeply apologize for the inconvenience 😔

Glad you like the package otherwise!

sfirke commented 2 years ago

I opened a discussion re: the merits of renaming here: https://github.com/sfirke/janitor/discussions/474