vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
462 stars 48 forks source link

Degrees of freedom: Automatic + t stat #1251

Closed vincentarelbundock closed 5 days ago

vincentarelbundock commented 6 days ago

Currently, marginaleffects reports $z$ statistics and computes $p$ values using pnorm(). Should we try to extract the residual degrees of freedom automatically and report $t$ statistics instead?

We could do this by writing a sanitize_df() function and calling this the top of our main functions.

sanitize_df <- function(model) {
  if (is.null(df)) {
    df <- tryCatch(insight::get_df(model, type = "residual"), error = function(e) NULL)
  }
  return(df)
}

This function would default to NULL in case of error, which reverts us to $z$.

This was suggested by @mattansb

vincentarelbundock commented 6 days ago

@ngreifer related to our discussion https://github.com/vincentarelbundock/marginaleffects/issues/1242

Do you see any downside of this approach?

mattansb commented 5 days ago

Would also be nice to have native support for Kenward-Roger / Satterthwaite dfs for mixed models (Some nice code to get those in emmeans:::emm_basis.merMod()) ;)

vincentarelbundock commented 5 days ago

@mattansb K-R and S are already supported. See the vcov argument in all functions and the documentation here: https://marginaleffects.com/r/man/predictions.html#arguments

mattansb commented 5 days ago

Ohhhh I missed that!

mattansb commented 5 days ago

Do you see any downside of this approach?

Downside: this gives df for glms (which don't really "need" them?)

mod <- glm(am ~ hp, binomial, mtcars)

insight::get_df(mod, type = "residual")
#> [1] 30
ngreifer commented 5 days ago

I agree with @mattansb that is this is too broad an approach because it gives GLMs a finite df, which would imply a t-test. I'm not expert on this but I think only linear models can be tested with t-statistics. In clarify, I have the following function which is used to decide whether to use a t-distribution or z-distribution:

https://github.com/ngreifer/clarify/blob/a4e59ea5a9460ae11d5968a6981e92a2de6d23fd/R/get_model_components.R#L26-L42

# Get the model degrees of freedom
## Assesses whether the model is linear and fit with OLS; if not,
## returns Inf. Linear models fit with MLE get Inf.
get_df <- function(fit) {

  if (!insight::is_model_supported(fit)) {
    return(Inf)
  }

  statistic <- insight::find_statistic(fit)

  if (identical(statistic, "chi-squared statistic")) {
    return(Inf)
  }

  insight::get_df(fit, type = "wald", statistic = statistic)
}

I don't remember where I found that code, but I think it does the job. Feel free to use it or something like it.

vincentarelbundock commented 5 days ago

Yeah, that all makes. Thanks both form the input!

mattansb commented 5 days ago

Noah's function could be a great default here as well IMO