ropensci / skimr

A frictionless, pipeable approach to dealing with summary statistics
https://docs.ropensci.org/skimr
1.11k stars 79 forks source link

Make skimr data table aware #633

Closed elinw closed 3 years ago

elinw commented 3 years ago

This is an update of the other branch.

elinw commented 3 years ago

@rsaporta @michaelquinn32 This is updated based on the comments. Rick I added you as a contributor, please check the information in DESCRIPTION. Also do we need to export the data_key function?

gjclaxton commented 2 years ago

Hi. Thanks for the great package. when I use it in data.table the skim_variable comes out with the value label data rather than the variable name. Not really sure how to address this. thanks

x[ city == "Chicago" ,  skim(rate) %>% yank("numeric")]

── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────
  skim_variable n_missing complete_rate   mean     sd    p0    p25    p50    p75  p100 hist 
1 data                  0             1 19548. 10479.   1.9 14913. 17381. 21015. 88008 ▇▇▁▁▁
x[ city == "Chicago" ,  my_skim(rate) %>% yank("numeric")]

── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────
  skim_variable  n_na length     mn    p0   p05    p10    p25    p50    p75    p95  p100
1 data              0    814 19548.   1.9 8434. 14214. 14913. 17381. 21015. 39510. 88008
x %>% skim(rate) %>% yank("numeric")

── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────
  skim_variable n_missing complete_rate   mean     sd    p0    p25    p50    p75    p100 hist 
1 rate                  0             1 30567. 36028.     0 13831. 21751. 39788. 3442280 ▇▁▁▁▁
michaelquinn32 commented 2 years ago

Thanks for the comment!

This is going to be somewhat challenging to support, because the NSE behaviors for data.table within brackets is different from the tidyverse-style NSE, which is what we support within skim().

And writing methods for this is tricky because we want to support customization of skim, which means it can't be a normal generic.

We'll need to look at this more, but for now, I think this should accomplish your first command.

x[ city == "Chicago" ,  ] %>% skim(rate) %>% yank("numeric")