rstudio / gt

Easily generate information-rich, publication-quality tables from R
https://gt.rstudio.com
Other
2.02k stars 205 forks source link

Add support for side-by-side regression tables? #92

Closed andrewheiss closed 3 years ago

andrewheiss commented 5 years ago

This is a phenomenal package and I'm a huge fan of the API for creating tables.

At least two other packages provide support for side-by-side regression tables (stargazer and huxtable), but both have limitations: stargazer only supports HTML and TeX and doesn't play well with knitr, and huxtable supports HTML and TeX with minimal Word support, since it creates Markdown tables that don't support column spans or other fancier table features.

I wonder if it would be possible to provide support for regression tables similar to stargazer and huxtable tables, given that the format fits well in the gt API paradigm.

library(tidyverse)
library(gt)
library(huxtable)

model1 <- lm(mpg_c ~ hp, data = gtcars)
model2 <- lm(mpg_c ~ hp + trq, data = gtcars)
model3 <- lm(mpg_c ~ hp + trq + year, data = gtcars)

huxreg(model1, model2, model3)
─────────────────────────────────────────────────────────────
                    (1)            (2)             (3)       
              ───────────────────────────────────────────────
  (Intercept)     23.932 ***     22.422 ***   -1275.504      
                  (1.540)        (1.777)      (1174.083)     
  hp              -0.017 ***     -0.024 ***      -0.021 ***  
                  (0.003)        (0.005)         (0.006)     
  trq                             0.012           0.008      
                                 (0.007)         (0.008)     
  year                                            0.644      
                                                 (0.582)     
              ───────────────────────────────────────────────
  N               46             46              46          
  R2               0.431          0.463           0.479      
  logLik        -108.446       -107.086        -106.426      
  AIC            222.892        222.172         222.853      
─────────────────────────────────────────────────────────────
  *** p < 0.001; ** p < 0.01; * p < 0.05.                    

Column names: names, model1, model2, model3

Right now there's a way to fake it very uglyly by extracting coefficients and model details using functions from broom (which huxtable does behind the scenes too), but it'd be cool if there was a less clunky way to make side-by-side regression tables with gt:

library(broom)
library(glue)

models_combined <- tibble(model = list(model1, model2, model3)) %>% 
  mutate(coefs = model %>% map(tidy),
         details = model %>% map(glance),
         model_number = 1:n())

model_coefs <- models_combined %>% 
  unnest(coefs) %>% 
  mutate(value = as.character(glue("{round(estimate, 2)} ({round(std.error, 2)})"))) %>% 
  select(model_number, term, value) %>% 
  spread(model_number, value)

model_details <- models_combined %>% 
  unnest(details) %>% 
  mutate(N = model %>% map_dbl(nobs),
         R2 = round(adj.r.squared, 2),
         AIC = round(AIC, 2)) %>% 
  select(model_number, N, R2, AIC) %>% 
  gather(term, value, -model_number) %>% 
  spread(model_number, value) %>% 
  mutate_at(vars(-term), as.character)

ugly_blank_row <- tibble(term = NA, `1` = NA, `2` = NA, `3` = NA)

bind_rows(model_coefs,ugly_blank_row,  model_details) %>% 
  gt() %>% 
  tab_header(title = "Side-by-side regression table")
Side-by-side regression table
term 1 2 3
(Intercept) 23.93 (1.54) 22.42 (1.78) \-1275.5 (1174.08)
hp \-0.02 (0) \-0.02 (0.01) \-0.02 (0.01)
trq NA 0.01 (0.01) 0.01 (0.01)
year NA NA 0.64 (0.58)
NA NA NA NA
AIC 222.89 222.17 222.85
N 46 46 46
R2 0.42 0.44 0.44
rich-iannone commented 5 years ago

Yeah! This has got to be possible presently. I'll experiment with this today. I'm also wondering whether it'd be a good idea to support a collection of standardized table types like this within gt.

andrewheiss commented 5 years ago

I experimented a little with it yesterday, but got stuck :).

The tricky part about this is that regression models are all different and it's difficult to support every contingency. stargazer hard codes the logic for every possible model, but that takes time and it doesn't support things made by Stan, for instance. huxtable is amazing because it doesn't hard code anything and instead relies on broom. If broom::tidy() and broom::glance() can work with a model, huxreg() can show it (which also allows developers of modeling packages to create their own S3 tidy.* and glance.* methods to get support for huxreg()).

Because huxtable has robust support for selectively pulling model details out and displaying them in a table, I tried different ways of feeding its output to gt() so that huxreg() could keep doing all the heavy lifting and gt() could handle the display, but I couldn't find a good way to do it.

JosiahParry commented 5 years ago

@andrewheiss your work around is extremely helpful.

andrewheiss commented 5 years ago

Whoa, @vincentarelbundock has created a gt-based solution for this! https://github.com/vincentarelbundock/gtsummary

vincentarelbundock commented 5 years ago

@andrewheiss How did you find out about this?!?

Yeah, I just worked on it over the weekend. It's not quite feature-complete yet, but almost. And it works quite nicely on my machine.

I'd be really grateful if anyone could take a look and tell me how it works on their end. Any suggestion will be most welcome.

If there's interest, I plan to invest some actual effort into this thing. The way the table just pops-out nicely in RStudio's Viewer was a real eye opener for me :)

andrewheiss commented 5 years ago

Ha, the benefits of the GitHub activity stream thing. It said you'd made a new repo :)

I'll check it out and throw some different models at it.

Thanks for this!

rich-iannone commented 5 years ago

@vincentarelbundock

Just wanted to say that I'm pretty impressed with gtsummary. Thanks for getting that going! I also found out about it from the GitHub Activity view. Can't wait to see where it goes!

vincentarelbundock commented 5 years ago

@rich-iannone Thanks. It was easy to put together, especially given that you are doing all the heavy lifting.

I'm probably going to have a lot of questions (sorry in advance!). For instance, I can't quite figure out the best way to suppress labels, without having a double horizontal rules between grouped tables (this is the mechanics I use to separate coefficients and gof stats).

rich-iannone commented 5 years ago

@vincentarelbundock Questions are definitely okay, no need to apologize for them!

Right now, I don't have a solution for suppressing (or even transforming) the row group labels. Could you file an issue and provide a code example (with screenshot)? Then I could see what I could do about that.

vincentarelbundock commented 5 years ago

https://github.com/rstudio/gt/issues/140

rich-iannone commented 3 years ago

Going to close this since {modelsummary} and {gtsummary} both have this handled!