vincentarelbundock commented 3 months ago

Professor Frank Wolak (in cc) at Stanford and I have recently published a working paper (https://www.nber.org/papers/w32124, also attached) where we propose a procedure to standardise how many digits to report in parameter estimates, which (we strongly believe) makes more sense than the prevalent heuristic of always printing two or three decimal digits by default.

One of the most frequent feedback we get when presenting this work in seminars is that it would be really great to have such a feature implemented in a package like modelsummary.

Do you think it might be feasible in future updates of the package to include formula #8 of our paper as a display option for coefficient estimates?

Concretely, I believe it would represent a very minor code update in the fmt_decimal function to create a new format that uses a different rounding rule for coefficient estimates. You will find below a short piece of code that implements this rounding rule for a given coefficient estimate "x" and its associated standard error estimate "se".

round_coef <- function(x, se, alpha = 0.1){

Used convention: report first non-significant digit

lsd <- ceiling((log(se)+log(qnorm(p = (1-alpha)))/log(10))-1
if (lsd <= 0){
  return(format(round(x, digits = -lsd), nsmall = -lsd))
}
# lsd > 0 requires to switch to scientific notations
else{
  order_x <- floor(log(abs(x))/log(10))
  if (order_x >= lsd){
    return(formatC(x, format = "e", digits = order_x-lsd))
  }
  else{
    return(0)
  }
}

}

vincentarelbundock commented 2 months ago

#' Rounding with number of digits determined by an equivalence test
#'
#' This function implements the suggestions of Astier & Wolak for the number of decimal digits to keep for coefficient estimates. The other statistics are rounded by `fmt_significant()`.
#' @param conf_level Confidence level to use for the equivalence test (1 - alpha).
#' @inheritParams fmt_significant
#' @references
#' Astier, Nicolas, and Frank A. Wolak. Credible Numbers: A Procedure for Reporting Statistical Precision in Parameter Estimates. No. w32124. National Bureau of Economic Research, 2024.
#'
#' @export
fmt_equivalence <- function(conf_level = 0.95, digits = 3, pdigits = NULL, ...) {

    alpha <- 1 - conf_level

    fun_sig <- fmt_significant(digits = digits, pdigits = pdigits)

    out <- function(x) {
        if (!isTRUE(checkmate::check_data_frame(x))) {
            msg <- "`fmt_equivalence()` only accepts data frames, not vectors."
            insight::format_error(msg)
        }
        if (!all(c("estimate", "std.error") %in% colnames(x))) {
            msg <- "The data frame to format must include an `estimate` and a `std.error` column."
            insight::format_error(msg)
        }

        x_out <- fun_sig(x)

        # Rank of last reported digit (first non-significant digit)
        x$fnsd <- ceiling((log(x$std.error) + log(qnorm(p = (1 - alpha)))) / log(10)) - 1

        # 1/ Format estimates (code provided by Nicolas Astier)
        estimate_string <- as.character(x$estimate)

        # Point estimates for which no scientific notation is needed
        if (sum(x$fnsd <= 0) > 0) {
            estimate_string[x$fnsd <= 0] <- unlist(
                lapply(
                    X = 1:sum(x$fnsd <= 0),
                    FUN = function(z) {
                        format(round(x$estimate[x$fnsd <= 0][z],
                            digits = -x$fnsd[x$fnsd <= 0][z]),
                            nsmall = -x$fnsd[x$fnsd <= 0][z])
                    })
            )
        }

        # Point estimates for which scientific notation is needed
        if (sum(x$fnsd > 0) > 0) {
            estimate_string[x$fnsd > 0] <- unlist(
                lapply(
                    X = 1:sum(x$fnsd > 0),
                    FUN = function(z) {
                        order_x <- floor(log(abs(x$estimate[x$fnsd > 0][z])) / log(10))
                        if (order_x >= x$fnsd[x$fnsd > 0][z]) {
                            return(formatC(x$estimate[x$fnsd > 0][x], format = "e", digits = order_x - x$fnsd[x$fnsd > 0][z]))
                        } else {
                            return(0)
                        }
                    })
            )
        }

        # Update dataframe
        x_out$estimate <- estimate_string

        return(x_out)
    }

    class(out) <- c("fmt_factory", class(out))
    return(out)
}

library(modelsummary)
mod <- lm(mpg ~ hp, mtcars)

# Default equivalence-based formatting
modelsummary(mod, fmt = fmt_equivalence())

# alpha = 0.2
modelsummary(mod, fmt = fmt_equivalence(conf_level = .8))

# default equivalence, but with alternative significant digits for other statistics
modelsummary(mod, fmt = fmt_equivalence(digits = 5))

vincentarelbundock commented 2 months ago

https://github.com/vincentarelbundock/modelsummary/commit/de44f0ab5da29ca9308cd768d3c1db5e40785789

vincentarelbundock / modelsummary

`fmt_equivalence` #784

Used convention: report first non-significant digit