ropensci / skimr

A frictionless, pipeable approach to dealing with summary statistics
https://docs.ropensci.org/skimr
1.11k stars 79 forks source link

convenience function - stata format #623

Closed bjornerstedt closed 3 years ago

bjornerstedt commented 3 years ago

Skimr is a very powerful tool for generating summary statistics. For quick use and for new users it would be great if it had some convenience functions for common formats.

Here is the code that I use with skimr to get summary statistics in a Stata format.

sumstats = function(df) {
  sstat = df %>% 
    skim() 
  sstat %>% yank("numeric") %>% 
    mutate(obs = attr(sstat,"data_rows") - n_missing) %>% 
    select(term=skim_variable, obs, mean, sd, min=p0, max=p100) %>% 
    knitr::kable(digits = 2)
}

In addition to producing an output in a format I am comfortable with, this small function solves several problems.

Convenience functions like this could be added to skimr either as functions, perhaps with names such as skim_stata. It would make skimr easier to use and more convenient, especially in teaching.

michaelquinn32 commented 3 years ago

Thanks for the suggestion Jonas!

We have worked really hard to make skimr more extensible, and you provide a good example about the merits of the approach. That said, these sorts of extensions are better suited to external packages. For example,

I'd be happy to provide more guidance to adding your function to your own package or another location that you might find suitable. The "Whole Game" chapter in the R-Pkgs book is a good TLDR to this process. https://r-pkgs.org/whole-game.html

Best wishes, Michael