vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
453 stars 47 forks source link

Hypothesis factory #1038

Closed vincentarelbundock closed 6 months ago

vincentarelbundock commented 7 months ago

@mattansb I’m thinking about a user-friendly interface to create more complex aggregation and contrast functions. I'll show you a prototype soon.

In the meantime, I am having trouble figuring out how emmeans computes the eff contrasts. Do you know what is going on here? Note that I use a Poisson model to avoid all the fancy backtransforming done in log models.

library(emmeans)
mod <- glm(gear ~ factor(cyl) + hp, family = poisson, data = mtcars)

Close but not quite:

emm <- emmeans(mod, ~ cyl + hp, at = list(hp = c(100, 110)))
contrast(emm, method = "eff")
#  contrast          estimate    SE  df z.ratio p.value
#  cyl4 hp100 effect   0.2513 0.183 Inf   1.370  0.2560
#  cyl6 hp100 effect   0.0699 0.149 Inf   0.468  0.6400
#  cyl8 hp100 effect  -0.3674 0.237 Inf  -1.547  0.2560
#  cyl4 hp110 effect   0.2821 0.202 Inf   1.396  0.2560
#  cyl6 hp110 effect   0.1007 0.157 Inf   0.643  0.6246
#  cyl8 hp110 effect  -0.3366 0.217 Inf  -1.552  0.2560
# 
# Results are given on the log (not the response) scale. 
# P value adjustment: fdr method for 6 tests

log(summary(emm)$emmean) - log(mean(summary(emm)$emmean))
# [1]  0.18885980  0.05621480 -0.36220751  0.20970595  0.07998298 -0.32630975

Close but not quite:

emm <- emmeans(mod, ~ cyl + hp, at = list(hp = c(100, 110)))
contrast(emm, method = "eff") |> regrid()
#  contrast          ratio    SE  df z.ratio p.value
#  cyl4 hp100 effect 1.286 0.236 Inf   5.451  <.0001
#  cyl6 hp100 effect 1.072 0.160 Inf   6.689  <.0001
#  cyl8 hp100 effect 0.693 0.164 Inf   4.212  <.0001
#  cyl4 hp110 effect 1.326 0.268 Inf   4.951  <.0001
#  cyl6 hp110 effect 1.106 0.173 Inf   6.383  <.0001
#  cyl8 hp110 effect 0.714 0.155 Inf   4.612  <.0001
# 
# P value adjustment: fdr method for 6 tests

summary(emm)$emmean / mean(summary(emm)$emmean)
# [1] 1.2078716 1.0578249 0.6961379 1.2333153 1.0832686 0.7215816
mattansb commented 7 months ago

eff is the deviation contrast - the difference between each mean and the grand mean. It is the same as del.eff but scaled by 1/k.

You can see these functions here:

emmeans:::eff.emmc
#> function (levs, exclude = integer(0), include, wts = rep(1, length(levs)), 
#>     ...) 
#> {
#>     exclude = .get.excl(levs, exclude, include)
#>     if ((length(exclude) > 0) && (length(wts) == length(levs) - 
#>         length(exclude))) {
#>         tmp = rep(0, length(levs))
#>         tmp[-exclude] = wts
#>         wts = tmp
#>     }
#>     if (length(wts) != length(levs)) 
#>         stop("length of 'wts' must equal the number of levels", 
#>             " or the number of included levels")
#>     wts[exclude] = 0
#>     M = data.frame(row.names = levs)
#>     wts = wts/sum(wts)
#>     for (i in setdiff(seq_along(levs), exclude)) {
#>         con = -wts
#>         con[i] = 1 + con[i]
#>         nm = paste(levs[i], "effect")
#>         M[[nm]] = con
#>     }
#>     attr(M, "desc") = "differences from grand mean"
#>     attr(M, "adjust") = "fdr"
#>     if (length(exclude) > 0) 
#>         attr(M, "famSize") = length(levs) - length(exclude)
#>     M
#> }
#> <bytecode: 0x00000267ba555018>
#> <environment: namespace:emmeans>
library(emmeans)
mod <- glm(gear ~ factor(cyl) + hp, family = poisson, data = mtcars)

emm <- emmeans(mod, ~ cyl + hp, at = list(hp = c(100, 110)))
contrast(emm, method = "eff")
#>  contrast          estimate    SE  df z.ratio p.value
#>  cyl4 hp100 effect   0.2513 0.183 Inf   1.370  0.2560
#>  cyl6 hp100 effect   0.0699 0.149 Inf   0.468  0.6400
#>  cyl8 hp100 effect  -0.3674 0.237 Inf  -1.547  0.2560
#>  cyl4 hp110 effect   0.2821 0.202 Inf   1.396  0.2560
#>  cyl6 hp110 effect   0.1007 0.157 Inf   0.643  0.6246
#>  cyl8 hp110 effect  -0.3366 0.217 Inf  -1.552  0.2560
#> 
#> Results are given on the log (not the response) scale. 
#> P value adjustment: fdr method for 6 tests

emm_s <- summary(emm)
emm_s$emmean - mean(emm_s$emmean)
#> [1]  0.25131304  0.06990925 -0.36736385  0.28207408  0.10067029 -0.33660281

contrast(emm, method = "eff") |> regrid()
#>  contrast          ratio    SE  df z.ratio p.value
#>  cyl4 hp100 effect 1.286 0.236 Inf   5.451  <.0001
#>  cyl6 hp100 effect 1.072 0.160 Inf   6.689  <.0001
#>  cyl8 hp100 effect 0.693 0.164 Inf   4.212  <.0001
#>  cyl4 hp110 effect 1.326 0.268 Inf   4.951  <.0001
#>  cyl6 hp110 effect 1.106 0.173 Inf   6.383  <.0001
#>  cyl8 hp110 effect 0.714 0.155 Inf   4.612  <.0001
#> 
#> P value adjustment: fdr method for 6 tests

exp(emm_s$emmean - mean(emm_s$emmean))
#> [1] 1.2857125 1.0724109 0.6925576 1.3258769 1.1059119 0.7141925

Created on 2024-03-04 with reprex v2.1.0

vincentarelbundock commented 6 months ago

hypotesis_factory was a cute idea technically, but it would be too "conceptual" for most users and bad interface. Closing in favor or hypothesis_by and hypothesis

mattansb commented 6 months ago

Thanks @vincentarelbundock !

Where can I read about hypothesis_by()?

vincentarelbundock commented 6 months ago

Oh no, it's not implemented yet. Just cleaning up issues.

I've made like 5 failed attempts, and I find it quite difficult. But I'll have something to show for it at some point.