Open drs81 opened 1 month ago
Thanks for the suggestion. I agree that this would be an interesting addition.
For transparency, though, I must admit that I have never read very seriously in that field, so I am unlikely to implement this myself. If someone would like to work on this feature, I would be happy to review and advise on implementation.
I would be interested in having a go at this over the next few weeks, particularly using the multcomp
package. (I have seen at least two SO questions looking for this behaviour, and it's something I have wanted in my work as well.)
As a workaround in the meantime, I am reasonably sure that multcomp::glht(<prediction-object>)
"just works", assuming that there is no back-transformation (e.g. with invlink(link)
or transform
options). See the example below, taken from Hothorn et al (2024) Simultaneous Inference in General Parametric Models, multcomp package vignette https://cran.r-project.org/package=multcomp/vignettes/generalsiminf.pdf
However, it would be really nice to have this functionality built-in to marginaleffects
so that hypotheses with back-transformations also just work.
In a simple case, the signature of the glht
function is
glht(model, linfct,
alternative = c("two.sided", "less", "greater"),
rhs = 0, ...)
Here, model
can be any object with a coef()
and vcov()
method, linfct
is an optional matrix specifying the linear hypothesis, alternative
identified the alternative hypothesis and rhs
identifies the right-hand side of the null hypothesis linfct %*% coef(model) == rhs
. If linfct
is not given, then it defaults to the identity matrix and the null hypothesis becomes coef(model) == rhs
, which is the case I am paying most attention to here.
In the case of the marginaleffects
objects (predictions, comparisons, etc.) the coef()
method returns the output of some (possibly non-linear) function (I'll call it $\mathbf{c} : \mathbb{R}^p \to \mathbb{R}^k)$ of the underlying model parameters $\hat{\boldsymbol{\theta}}$, and vcov()
returns the estimated covariance matrix for those estimates (i.e. J %*% V %*% t(J)
), where J
is the Jacobian of $\mathbf{c}$ and V
is a sensible estimator of the covariance matrix of $\hat{\boldsymbol{\theta}}$ -- marginaleffects optionally lets this be a sandwich estimator with a single argument, which is really cool.
The resulting glht
object (using the default hypothesis) can be used to return global $F$- or $\chi^2$-tests (from summary()
), adjusted or non-adjusted p-values for individual hypotheses (also from summary()
), or adjusted or non-adjusted confidence intervals for each element of coef(model)
(using confint()
).
See the example below, taken from Hothorn et al (2024) Simultaneous Inference in General Parametric Models, multcomp package vignette section 6.3 https://cran.r-project.org/package=multcomp/vignettes/generalsiminf.pdf
library(multcomp)
library(marginaleffects)
data("alzheimer", package = "coin")
y <- factor(alzheimer$disease == "Alzheimer",
labels = c("other", "Alzheimer"))
gmod <- glm(y ~ smoking * gender, data = alzheimer,
family = binomial())
# summary(gmod)
# marginaleffects is helpful for getting our linfct matrix K here
fitdata <- datagrid(gender = unique, smoking = unique, model = gmod)
K <- model.matrix(delete.response(terms(gmod)), data = fitdata)
rownames(K) <- with(fitdata, interaction(smoking, gender, sep = ':'))
gmod_ci <- confint(glht(gmod, linfct = K))
gmod_ci$confint <- apply(gmod_ci$confint, 2, binomial()$linkinv)
plot(gmod_ci, xlab = "Probability of Developing Alzheimer (multcomp)",
xlim = c(0, 1))
pred <- predictions(
gmod, datagrid(gender = unique, smoking = unique),
type = 'link' # We will need to back-transform later
)
pred
#>
#> gender smoking Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> Female None -0.394 0.136 -2.908 0.00364 8.1 -0.660 -0.129
#> Female <10 -0.357 0.493 -0.724 0.46921 1.1 -1.323 0.609
#> Female 10-20 -1.006 0.302 -3.332 < 0.001 10.2 -1.597 -0.414
#> Female >20 0.154 0.321 0.480 0.63129 0.7 -0.475 0.784
#> Male None -0.316 0.222 -1.421 0.15531 2.7 -0.751 0.120
#> Male <10 0.981 0.677 1.449 0.14740 2.8 -0.346 2.308
#> Male 10-20 -0.956 0.304 -3.145 0.00166 9.2 -1.551 -0.360
#> Male >20 -2.037 0.434 -4.693 < 0.001 18.5 -2.888 -1.186
#>
#> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, gender, smoking
#> Type: link
pred_to_glht <- glht(pred)
pred_ci <- confint(pred_to_glht)
pred_ci$confint <- apply(pred_ci$confint, 2, binomial()$linkinv)
plot(pred_ci, xlab = "Probability of Developing Alzheimer (marginaleffects + multcomp)",
xlim = c(0, 1))
# Check -------------------------------------------------------------------
all.equal(gmod_ci$confint, pred_ci$confint, check.attributes = FALSE)
#> [1] TRUE
Created on 2024-08-02 with reprex v2.1.1
From a perusal of the source, it looks like the p-values and confidence intervals are generally calculated in get_ci()
-- though it appears that some CIs and p-values might be calculated earlier (and I haven't worked out where yet).
My vague thought bubble of an idea for implementation would be to update the get_ci()
function to get estimates and the covariance matrix (standard errors alone are not enough) from x
, and change (say) the p_adjust
argument to equivalent test
and calpha
arguments from multcomp::summary.glht
and multcomp::confint.glht
. Is there anything else I'd need to be aware of before treading that path?
Thanks for the comments and proposal. I really appreciate that you took so much time to look at the code and understand the package.
I agree that this would be a cool feature to add.
I'm off to bed and haven't looked at this seriously, but at first glance what you write makes a lot of sense, and the things you describe about the package seem correct.
I believe that some of the conditions that would flip the p_overwrite
and ci_overwrite
flags to TRUE
may never actually occur in reality. They may be leftover from previous infrastructure. I'll try to clean those up eventually.
A couple situations where they may be still be important are: (1) when using p_adjust
because we don't want to display unadjusted CIs along with adjusted p, and (2) bootstraping with inferences()
.
In terms of user interface, I'd like us to think about the pros and cons of different options before committing.
One strong preference I have is to limit the number of arguments in the core functions predictions()
, comparisons()
, and slopes()
. I imagine that multcomp::glht
accepts quite a few arguments to tweak behavior, such as alpha level, multiple comparison correction method, 1 vs 2-sided tests, etc. I will not want to add all of those arguments in the core functions.
I'm not sure what the full set of alternatives is, but two options:
1) Use (a potentially renamed) p_adjust
to adjust both p values and CIs. This would not be very flexible, but probably simple to implement and understand for users.
2) Move p_adjust
out of the core functions, and into hypotheses()
. As you can see in the docs, that function already has a couple arguments like joint
for joint hypothesis tests. But frankly, the implementation for that is a bit sloppy, and the interface is terrible. There's lots of room for improvement here, and hypotheses()
could be a nice single point of entry for these kinds of tests, with all the goodies users would expect.
If we go with option 2, this issue could be rename: "Make hypotheses()
powerful"
I have to admit, I'm not very familiar with hypotheses()
. I just had a play around (following on from my examples above) and it seems to mostly align with glht
for a $\chi^2$ test. Interestingly, glht
didn't calculate an F statistic with this example -- I think it has something against GLMs for that.
hyp <- hypotheses(pred, joint = TRUE, joint_test = 'chisq')
hyp$p.value
#> [1] 2.416523e-09
glht_test <- summary(pred_to_glht, test = Chisqtest())
glht_test$test$pvalue
#> [,1]
#> [1,] 2.416523e-09
all.equal(hyp$p.value, drop(glht_test$test$pvalue))
#> [1] TRUE
Aside: it didn't feel immediately clear to me from the documentation that I could just pass a predictions
object into hypotheses
, and what the default hypothesis = NULL
means.
That being said, hypotheses()
does seem like a sensible place to give a lot more flexibility for confidence intervals and p-values. So option 2 isn't bad, and as you point out it makes an excellent pun for the issue tracker. Regarding my use cases though, I would tend to prefer some "simple" functionality in predictions()
and comparisons()
, and for that I would prefer option (1).
Bearing in mind the desire to keep the additional argument count to core functions low or zero, I think the main options I would want in p_adjust
are:
Univariate/unadjusted p-values and CIs. This corresponds to p_adjust = NULL
in marginaleffects, test = univariate()
in multcomp::summary.glht
, and calpha = univariate_calpha()
in multcomp::confint.glht()
.
Single-step adjusted p-values and CIs. This is the requested feature for marginaleffects, but corresponds to test = adjusted(type = 'single-step')
and calpha = adjusted_calpha()
in multcomp's summary
and confint
, respectively. This could maybe be selected with p_adjust = 'single-step'
.
multcomp
has some other p-adjustment methods, which I frankly don't know enough about (... and therefore don't use).
I can easily imagine how to implement a Bonferroni correction for both p-values and confidence intervals -- adjust the level to $1 - \alpha/k$ for both, and then use the univariate methods. On the other hand, the Holm methods breaks straight away because it requires ordering the p-values first. A collection of "Holm confidence intervals" would have different (univariate) levels based on the p-value of the estimate (so they now also depend on the RHS of the hypothesis), and this seems either circular or arbitrary to me. I'm also not familiar with Hochberg, Hommel, BH or BY adjustments to comment on them.
So, maybe we allow a third option for the new p_adjust
:
p_adjust = 'bonferroni'
, which reports p-values multiplied by the number of hypotheses $k$, and univariate confidence intervals at the $(1 - \alpha/k)$ level.Regarding the naming of p_adjust
: it might be good to softly/gradually rename to something like power_adjust
to indicate it works on both confidence intervals and p-values?
Would it make sense to onbard multcomp
as an optional dependency? That library is battle tested, and it would save us from having to re-implement everything from scratch. We can also make that dependency optional, so people don't have to install it if they don't care about multiple comparisons.
In that vision, the hypotheses()
function becomes a front-end to multcomp
, with some sugar to make common tests easier to execute, and returning objects of marginaleffects
class (with pretty print, etc).
If that's a good option, we could proceed in two steps: (1) improve hypotheses()
to support all the options we want, (2) decide if some of these options should live in the core functions.
It feels hard to make a call on 2 when we haven't done an exhaustive tour of 1. It might be that a piping workflow works fine and that we don't need any additional argument in the core functions. Or it might be that we do want some arguments there, in which case it is trivial to call hypotheses()
at the very end of the function definition for predictions()
.
Strengthening hypotheses()
in line with option (1) seems like a good first step to me. I also like the idea of keeping multcomp
as an optional dependency -- though that would mean we don't get to piggyback off it for univariate CIs and p-values (which, admittedly, there is already code for in this package).
I will have a look through hypotheses()
to try and get a sense of how it works, and might start playing around with some code.
Sounds great, thanks!
FYI, I'm taking a week off coding and open source, so I might not respond very quickly in the near future.
But I'll be back and active soon.
OK, I played with this a bit tonight. I had never actually worked with the multcomp
package. Unsurprisingly (given the authors and the history), it is awesome.
I’m almost thinking that we should deprecate the p_adjust
argument in marginaleffects
, and simply document how to use multcomp
for this purpose. It is so seamless and easy that I’m not sure what building support directly into marginaleffects
really brings to the table.
Consider:
library(multcomp)
library(marginaleffects)
mod <- lm(mpg ~ factor(carb) - 1, data = mtcars)
p <-avg_predictions(mod, by = "carb")
# simultaneous intervals
p |>
glht() |>
confint()
#
# Simultaneous Confidence Intervals
#
# Fit: NULL
#
# Quantile = 2.6313
# 95% family-wise confidence level
#
#
# Linear Hypotheses:
# Estimate lwr upr
# b1 == 0 15.7900 11.7088 19.8712
# b2 == 0 25.3429 20.4649 30.2208
# b3 == 0 22.4000 18.3188 26.4812
# b4 == 0 16.3000 8.8488 23.7512
# b5 == 0 19.7000 6.7941 32.6059
# b6 == 0 15.0000 2.0941 27.9059
# Bonferroni correction for p values
p |>
glht() |>
summary(test = adjusted("bonferroni"))
#
# Simultaneous Tests for General Linear Hypotheses
#
# Linear Hypotheses:
# Estimate Std. Error z value Pr(>|z|)
# b1 == 0 15.790 1.551 10.180 < 2e-16 ***
# b2 == 0 25.343 1.854 13.670 < 2e-16 ***
# b3 == 0 22.400 1.551 14.442 < 2e-16 ***
# b4 == 0 16.300 2.832 5.756 5.17e-08 ***
# b5 == 0 19.700 4.905 4.016 0.000354 ***
# b6 == 0 15.000 4.905 3.058 0.013359 *
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# (Adjusted p values reported -- bonferroni method)
# Holm correction for p values
p |>
glht() |>
summary(test = adjusted("holm"))
#
# Simultaneous Tests for General Linear Hypotheses
#
# Linear Hypotheses:
# Estimate Std. Error z value Pr(>|z|)
# b1 == 0 15.790 1.551 10.180 < 2e-16 ***
# b2 == 0 25.343 1.854 13.670 < 2e-16 ***
# b3 == 0 22.400 1.551 14.442 < 2e-16 ***
# b4 == 0 16.300 2.832 5.756 2.58e-08 ***
# b5 == 0 19.700 4.905 4.016 0.000118 ***
# b6 == 0 15.000 4.905 3.058 0.002227 **
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# (Adjusted p values reported -- holm method)
multcomp really is a very nice package!
The two features I would like integrated with marginaleffects as a user are:
invlink(link)
. This is a bit of a manual process with multcomp.b1 == 0
is a little opaque.Deprecating p_adjust
seems sensible to me, and documenting multcomp
capability would be a sensible next step while working out if either of the above are worth implementing in eg hypotheses()
.
PS on my phone and can’t write code right now I’m afraid - preparing to move country and the computer is in pieces.
It would be great if support can be added to compute simultaneous confidence intervals, similar to routines available in emmeans, rms, etc.