tidymodels / broom

Convert statistical analysis objects from R into tidy format
https://broom.tidymodels.org
Other
1.43k stars 302 forks source link

Retaining statistics names #31

Closed crsh closed 9 years ago

crsh commented 9 years ago

I'm working on an R package of my own, which will provide functions to knit scientific manuscripts.

To this end, I'm building convenience functions that assemble text strings form analysis objects, such as htest or summary.lm etc. I am considering to use broom to tidy those objects. The only thing that is currently keeping me from using your package is that fact that when I tidy objects I loose all information about what the estimates actually are (differences of means, means of differences, correlation coefficients, etc.)---the same is, of course, true for other columns of the tidied output.

Are there any plans to add this information to tidied data.frames? I think retaining this information would be helpful for other purposes and programming literacy in general.

An unobtrusive way of doing this would be to simply add attributes to the data.frame from the original object (adding a row to the data.frame would be another possibility). I've thrown something together for an object from t.test() to illustrate what I mean (this should generalize to other htest objects with some minor adaptations):

> t_test <- t.test(extra ~ group, data = sleep)    
> tidy_t_test <- tidy(t_test)
> tidy_t_test

  estimate estimate1 estimate2 statistic    p.value parameter  conf.low conf.high
1    -1.58      0.75      2.33 -1.860813 0.07939414  17.77647 -3.365483 0.2054832

> vars <- lapply(t_test, attr, "names")
> vars <- vars[!unlist(lapply(vars, is.null))]
> conf_level <- attr(t_test$conf.int, "conf.level") * 100
> conf_levels <- paste(c((100 - conf_level) / 2, 100 - (100 - conf_level) / 2), "%")

> attr(tidy_t_test, "vars") <- c(vars$null.value, vars$estimate, vars$statistic, "p.value", vars$parameter, conf_levels)
> str(tidy_t_test)

'data.frame':  1 obs. of  8 variables:
$ estimate : num -1.58
$ estimate1: num 0.75
$ estimate2: num 2.33
$ statistic: num -1.86
$ p.value  : num 0.0794
$ parameter: num 17.8
$ conf.low : num -3.37
$ conf.high: num 0.205
- attr(*, "vars")= chr  "difference in means" "mean in group 1" "mean in group 2" "t" ...

> attr(tidy_t_test, "vars")
[1] "difference in means" "mean in group 1" "mean in group 2" "t" "p.value" "df" "2.5 %"              
[8] "97.5 %"
dgrtwo commented 9 years ago

I'm reluctant to add extra attributes to tidy data frame outputs. Attributes aren't preserved when data frames are rbinded or merged, and there's no guarantee when other munging operators are applied. Part of the value of tidying operations is that even if they lose a little information, you know that all the information it provides is in the form of a rowname-less data frame.

I certainly can't add it as an extra row to the data frame, both because that would make those columns character rather than numeric vectors (the $estimate vector would become c("-1.58", "difference in means") and because the extra row would get in the way of many applications (now if they were recombined into many t-tests, only half of the rows would be actual values). as.numeric(tidy(t_test)$estimate[1]) is quite cumbersome!

If the purpose of your functions is to turn a hypothesis test object into a text string, I don't know that broom would necessarily be helpful as an intermediate step anyway. broom's main value is in abstracting away those details of an R object so that it fits into a data frame, so rather than changing broom so that it keeps those details in, you could simply work from the original object. Even if tidy did help, you could always work from both the tidied version and the untidied version to construct your string.

github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.