Add arbitrary statistics or details to statistics section?

andrewheiss commented 6 years ago

stargazer has the ability to add additional statistical details to the bottom of its regression tables to indicate extra information, like whether a model includes fixed effects terms that are omitted. This is especially helpful when running models that include country- or year-level factor variables. Instead of including all those coefficients in the table, it’s easy to just omit them and add a note for that column indicating that they’ve been omitted.

library(tidyverse)
library(huxtable)
library(stargazer)

data(diamonds, package = 'ggplot2')

lm1 <- lm(price ~ carat, diamonds)
lm2 <- lm(price ~ depth, diamonds)
lm3 <- lm(price ~ carat + depth + cut, diamonds)

# Ignore the "cut" variables and just indicate whether they were included
stargazer(lm1, lm2, lm3,
          type = "text", omit = "cut", 
          intercept.bottom = FALSE,
          add.lines = list(c("Cut fixed effects",
                             c("No", "No", "Yes"))),
          keep.stat = c("n"))
#> 
#> =========================================================
#>                             Dependent variable:          
#>                   ---------------------------------------
#>                                    price                 
#>                        (1)          (2)          (3)     
#> ---------------------------------------------------------
#> Constant          -2,256.361*** 5,763.668***   434.754   
#>                     (13.055)     (740.556)    (301.924)  
#>                                                          
#> carat             7,756.426***               7,873.249***
#>                     (14.067)                   (13.967)  
#>                                                          
#> depth                            -29.650**    -50.418*** 
#>                                   (11.990)     (4.848)   
#>                                                          
#> ---------------------------------------------------------
#> Cut fixed effects      No            No          Yes     
#> Observations         53,940        53,940       53,940   
#> =========================================================
#> Note:                         *p<0.1; **p<0.05; ***p<0.01

There isn’t a way to do this with huxreg (at least I haven’t found one yet). It’s easy enough to omit variables with omit_coefs, but there’s not a way to add additional model-level details:

coefs_to_omit <- names(lm3$coefficients) %>% 
  .[contains("cut", vars = .)]

huxreg(lm1, lm2, lm3,
       omit_coefs = coefs_to_omit,
       statistics = c(N = "nobs"))

───────────────────────────────────────────────────────────────
                     (1)             (2)             (3)       
              ─────────────────────────────────────────────────
  (Intercept)   -2256.361 ***    5763.668 ***    -648.907 *    
                  (13.055)       (740.556)       (312.841)     
  carat          7756.426 ***                    7873.249 ***  
                  (14.067)                        (13.967)     
  depth                           -29.650 *       -50.418 ***  
                                  (11.990)         (4.848)     
              ─────────────────────────────────────────────────
  N             53940           53940           53940          
───────────────────────────────────────────────────────────────
  *** p < 0.001; ** p < 0.01; * p < 0.05.                      

Column names: names, model1, model2, model3

For now, I’ve been including that information in a note, but that’s unwieldy

huxreg(lm1, lm2, lm3,
       omit_coefs = coefs_to_omit,
       statistics = c(N = "nobs"),
       note = "Model 3 includes fixed effects for cut type. {stars}")

─────────────────────────────────────────────────────────────────────────────────────────────────────
                                    (1)                      (2)                      (3)            
                         ────────────────────────────────────────────────────────────────────────────
  (Intercept)                       -2256.361 ***             5763.668 ***             -648.907 *    
                                      (13.055)                (740.556)                (312.841)     
  carat                              7756.426 ***                                      7873.249 ***  
                                      (14.067)                                          (13.967)     
  depth                                                        -29.650 *                -50.418 ***  
                                                               (11.990)                  (4.848)     
                         ────────────────────────────────────────────────────────────────────────────
  N                                 53940                    53940                    53940          
─────────────────────────────────────────────────────────────────────────────────────────────────────
  Model 3 includes fixed effects for cut type.  *** p < 0.001; ** p < 0.01; * p < 0.05               

Column names: names, model1, model2, model3

I don't know the best way to do this though. Perhaps a new argument like add_statistics or add_lines? Have users create their own glance.*() methods that include extra details like that (kind of like how I add m here)?

andrewheiss commented 6 years ago

I found a workaround with insert_row(), but it's kind of clunky, especially when adding multiple rows:

coefs_to_omit <- names(lm3$coefficients) %>% 
  .[contains("cut", vars = .)]

reg_output <- huxreg(lm1, lm2, lm3,
                     omit_coefs = coefs_to_omit,
                     statistics = c(N = "nobs"))

insert_row(reg_output, 
           c("Cut fixed effects", "No", "No", "Yes"), 
           after = nrow(reg_output) - 2,
           copy_cell_props = FALSE)

─────────────────────────────────────────────────────────────────────
                           (1)             (2)             (3)       
                    ─────────────────────────────────────────────────
  (Intercept)         -2256.361 ***    5763.668 ***    -648.907 *    
                        (13.055)       (740.556)       (312.841)     
  carat                7756.426 ***                    7873.249 ***  
                        (14.067)                        (13.967)     
  depth                                 -29.650 *       -50.418 ***  
                                        (11.990)         (4.848)     
                    ─────────────────────────────────────────────────
  Cut fixed effects   No              No              Yes            
  N                   53940           53940           53940          
─────────────────────────────────────────────────────────────────────
  *** p < 0.001; ** p < 0.01; * p < 0.05.                            

Column names: names, model1, model2, model3

This can be dplyrized too:

huxreg(lm1, lm2, lm3,
       omit_coefs = coefs_to_omit,
       statistics = c(N = "nobs")) %>% 
  insert_row(c("Cut fixed effects", "No", "No", "Yes"), 
             after = nrow(.) - 2,
             copy_cell_props = FALSE)

hughjonesd commented 6 years ago

For one row, insert_row seems fine. For multiple rows, why not just rbind it with another huxtable? Overall, I don't like the idea of adding arbitrary table manipulation into the huxreg function. The idea is to create a huxtable that you can then alter as you wish.

andrewheiss commented 6 years ago

Ooh, I like the rbind idea for multiple rows. Thanks!

hughjonesd commented 6 years ago

You may find add_rows and add_columns helpful.

hughjonesd / huxtable

Add arbitrary statistics or details to statistics section? #71