kosukeimai / mediation

R package mediation
61 stars 30 forks source link

tidy output from summary #7

Open benwhalley opened 6 years ago

benwhalley commented 6 years ago

At present, summary(mediation(...)) produces output like this:

Causal Mediation Analysis 

Quasi-Bayesian Confidence Intervals

               Estimate 95% CI Lower 95% CI Upper p-value    
ACME             0.1477       0.0237         0.28   0.018 *  
ADE              0.2998       0.1166         0.50  <2e-16 ***
Total Effect     0.4475       0.2293         0.66  <2e-16 ***
Prop. Mediated   0.3295       0.0741         0.60   0.018 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Sample Size Used: 200 

Simulations: 1000 

It would be nice if this could be supplemented by a tidy data frame, such that this would work:

as_data_frame(summary(mediation(...)))

To produce a data frame with these columns:

(these chosen to match those typically produced by broom::tidy)

ngreifer commented 6 years ago

I would like this too, though I would settle for a "coefficients" object like summary.glm() produces, e.g., summary(mediate(...))$coefficients

ruthappel commented 2 years ago

Hi! I agree that this would be useful. For the purposes of a recent analysis, I extracted the values and put them into a dataframe and then produced a LaTeX table using the gt package. I'm sharing the code here in case it is useful for someone with a similar request. Note that the function to create the dataframe is based on the print.summary.mediate function that is part of the package (see https://github.com/kosukeimai/mediation/blob/master/R/mediate.R). Depending on the type of model you are looking for, it might have to be adapted. The packages mediation, tidyverse, and gt need to be loaded for the code below.

# Function: create_mediation_df
# Turns the mediation output into a dataframe
# Based on the mediation package's print.summary.mediate function
# In case several summaries are to be combined, optionally takes a variable name 
# and model name as input to distinguish these when rendering a table
# Parameters: model_object: mediation output object
#             variable_name: optional parameter if the outcome variable name should be added to the df
#             model_name: optional parameter if the model name be added to the df
# Returns: dataframe with mediation summary information, optionally with variable and model names
create_mediation_df <- function(model_object, variable_name = NULL, model_name = NULL) {
  x <- summary(model_object)

  clp <- 100 * x$conf.level
  smat <- c(x$d1, x$d1.ci, x$d1.p)
  smat <- rbind(smat, c(x$z0, x$z0.ci, x$z0.p))
  smat <- rbind(smat, c(x$tau.coef, x$tau.ci, x$tau.p))
  smat <- rbind(smat, c(x$n0, x$n0.ci, x$n0.p))
  rownames(smat) <- c("ACME", "ADE", "Total Effect", "Proportion Mediated")

  colnames(smat) <- c("Estimate", paste(clp, "% CI Lower", sep=""),
  paste(clp, "% CI Upper", sep = ""), "p-value")

  data <- as.data.frame(printCoefmat(smat, tst.ind = NULL)) %>% 
    rownames_to_column(var = "Measure") %>% 
    add_row(Measure = "N Observations", Estimate = x$nobs) %>% 
    add_row(Measure = "N Simulations", Estimate = x$sims) %>% 
    # add variable and model names if provided
    mutate(Variable = variable_name, 
                 Model = model_name)

  return(data)
}

Then, to print this as a table (e.g. in LaTeX format), I used the gt package:

# create dataframes with mediation summary information
df.med_1 <- create_mediation_df(med.out_1_baseline, "Outcome 1", "Baseline Model")
df.med_2 <- create_mediation_df(med.out_1_controlled, "Outcome 1", "Model With Controls")
df.med_3 <- create_mediation_df(med.out_2_baseline, "Outcome 2", "Baseline Model")
df.med_4 <- create_mediation_df(med.out_2_controlled, "Outcome 2", "Model With Controls")

# bind dataframes together
data <- rbind(df.med_1, df.med_2, df.med_3, df.med_4)

# create output table (in LaTeX format in this example)
table.mediation <- data %>% 
  # add stars for p values
  mutate(" " = dplyr::case_when(`p-value` < 0.001 ~ "***",
                                `p-value` < 0.01 ~ "**",
                                `p-value` < 0.05 ~ "*",
                                TRUE ~ "")) %>% 
  # format p values
  mutate(`p-value` = format.pval(`p-value`, eps = 0.001, digits = 3),
         `p-value` = if_else(`p-value` == "NA", NA_character_, `p-value`)) %>% 
  # group data to automatically and create subheaders
  group_by(Variable, Model) %>% 
  # turn into gt object
  gt(caption = "My Caption") %>% 
  # format NAs that were introduced by coercion to dataframe as blanks
  fmt_missing(columns = everything(), missing_text = "")  %>%
  # format numeric columns to feature 3 decimal places
  fmt_number(columns = c("Estimate", "95% CI Lower", "95% CI Upper"),
             rows = Measure %in% c("ACME", "ADE", "Total Effect", "Proportion Mediated"),
             decimals = 3) %>% 
  # add border above number of observations
  tab_style(
    style = cell_borders(side = c("top"),  color = "black", weight = px(1)),
    locations = cells_body(columns = everything(), 
                           rows = Measure == "N Observations")) %>% 
  # change formatting of group header text
  tab_style(style = list(cell_text(style = "oblique", align = "center")),
            locations = cells_row_groups()) %>% 
  # change formatting of labels
  tab_style(style = list(cell_text(weight = "bold", align = "center")),
            locations = cells_column_labels()) %>% 
  # right algin p-value
  tab_style(style = list(cell_text(align = "right")),
            locations = cells_body(columns = `p-value`)) %>%   
  # left align stars
  tab_style(style = list(cell_text(align = "left")),
            locations = cells_body(columns = ` `)) %>% 
  # add p value note at the bottom
  tab_source_note(source_note = md("_*** p < 0.001, ** p < 0.01, * p < 0.05_")) %>% 
  # adapt font and border formatting
  tab_options(
    row_group.border.top.color = "black",
    row_group.border.bottom.color = "black",
    column_labels.border.top.color = "black",
    column_labels.border.bottom.color = "black",
    table_body.border.top.width = 1,
    table_body.border.bottom.width = 1,
    table_body.hlines.color = "white",
    table.font.size = 12) %>% 
  opt_table_font(font = "times") %>% 
  # optional: turn into LaTeX code
  as_latex() %>%
  as.character() %>%
  cat()