Closed mattansb closed 6 years ago
This is an interesting suggestion, but I will have to consider whether I want to implement another type
option. The current ones -- "response" and "link" -- are generic and make sense for a variety of models, whereas "rr" is very specific to cases where the logit link is used. If I do this, I probably will think of a different name for it, as I want to keep type = "r"
as an unambiguous match for type = "response"
.
Also, it is currently possible to obtain estimates of probability ratios:
> g <- sample(c(0,1),size = 1000, replace = T)
> outcome <- round(1/(1+exp(-g + rnorm(1000))))
> fit <- glm(outcome ~ factor(g), family = binomial())
>
> ( emm = emmeans(fit, "g", type = "response") )
g prob SE df asymp.LCL asymp.UCL
0 0.4711730 0.02225681 Inf 0.4278794 0.5149046
1 0.8591549 0.01560372 Inf 0.8257146 0.8870567
Confidence level used: 0.95
Intervals are back-transformed from the logit scale
So far, comparable to what you show (we had different random-number seeds). The next step uses the regrid()
function to make it look like the log link was used:
> pairs(regrid(emm, transform = "log"), type = "response")
contrast ratio SE df z.ratio p.value
0 / 1 0.5484144 0.02775424 Inf -11.87 <.0001
Tests are performed on the log scale
This use of regrid()
allows ratio comparisons of anything for which log transforms are sensible.
There is yet a quicker way:
> emmeans(fit, pairwise ~ g, transform = "log", type = "response")
$`emmeans`
g response SE df asymp.LCL asymp.UCL
0 0.4711730 0.02225681 Inf 0.4295089 0.5168787
1 0.8591549 0.01560372 Inf 0.8291101 0.8902885
Confidence level used: 0.95
Intervals are back-transformed from the log scale
$contrasts
contrast ratio SE df z.ratio p.value
0 / 1 0.5484144 0.02775424 Inf -11.87 <.0001
Tests are performed on the log scale
However, in a more complicated example, it won't work quite the same, because the transform
spec alters the reference grid itself, before any marginal averaging is performed.
However, in a more complicated example, it won't work quite the same, because the transform spec alters the reference grid itself, before any marginal averaging is performed.
This in reference to the use of refgrid
in general, and not just the Addendum, correct?
(I played around with the method, and it indeed didn't work in a more complicated situation - like interaction contrasts.)
This seems to do the trick:
> g <- sample(c(0,1),size = 1000, replace = T)
> outcome <- round(1/(1+exp(-g + rnorm(1000))))
> g <- ifelse(g==1,round(sample(1:2,size = sum(g==1),replace = T)),g)
> k <- sample(c(0,1),size = 1000, replace = T)
> fit <- glm(outcome ~ factor(g) * factor(k), family = binomial())
>
> em1 <- emmeans(fit,~g*k)
>
> contrast_RR <- function(rg,...) {
+ probs <- test(rg,type = "response")$prob
+ res <- contrast(rg,...)
+ c_weights <- coef(res)
+ c_weights <- c_weights[,-seq_len(ncol(c_weights) - nrow(summary(res)))]
+ c_weights <- as.matrix(c_weights)
+
+
+ res <- summary(res)
+ res$estimate <- t(exp(log(probs) %*% c_weights))
+ colnames(res)[colnames(res)=='estimate'] <- 'RR'
+ return(res)
+ }
>
> contrast_RR(em1,interaction = c('pairwise','pairwise'))
g_pairwise k_pairwise RR SE df z.ratio p.value
0 - 1 0 - 1 0.9325756 0.4168997 Inf -1.210 0.2263
0 - 2 0 - 1 1.0490940 0.3924615 Inf 0.684 0.4942
1 - 2 0 - 1 1.1249426 0.5160107 Inf 1.497 0.1343
Correction: this seems to fail with more complex interaction contrasts (e.g., poly
)...
Well, contrast_RR()
may be useful for your own purposes, but is not usable in emmeans because it does not return the right class of object (contrast()
returns an emmGrid
object), and because the standard errors are incorrect.
The main point here is that the current summary.emmGrid
method does not make a special case with the log transformation when we have an interaction contrast, like it does with other contrasts. I will investigate how feasible/desirable it is to make that extension.
After pondering this for some time, I think I'm ready to try to resolve this...
The main thing to understand is that type
is really an argument for summary.emmGrid()
, and it exists in emmeans()
merely as a courtesy. If type
is supplied in emmeans()
, it doesn't change any of the computations; it just sets an attribute for how we want the results to be summarized later. It seems to make sense because the default print
method for emmGrid
objects prints its summary.
As I pointed out earlier, it is possible to obtain estimates of ratios of proportions by calling regrid(... transform = "log")
before contrasts are computed. This is also true of interaction contrasts, but things have to be done in the right sequence.
I'll use your example, but make it exactly reproducible by setting an explicit random-number seed:
> set.seed(12345)
>
> g <- sample(c(0,1),size = 1000, replace = T)
> outcome <- round(1/(1+exp(-g + rnorm(1000))))
> g <- ifelse(g==1,round(sample(1:2,size = sum(g==1),replace = T)),g)
> k <- sample(c(0,1),size = 1000, replace = T)
> fit <- glm(outcome ~ factor(g) * factor(k), family = binomial())
>
> em1 <- emmeans(fit,~g*k)
Now, re-grid em1
to the log scale:
> em1log <- regrid(em1, "log")
Finally, obtain the interaction contrasts and summarize the results with type = "response"
:
> summary(contrast(em1log, interaction = "pairwise"), type = "response")
g_pairwise k_pairwise ratio SE df z.ratio p.value
0 / 1 0 / 1 0.9240704 0.09598017 Inf -0.760 0.4471
0 / 2 0 / 1 0.9659759 0.10389263 Inf -0.322 0.7476
1 / 2 0 / 1 1.0453488 0.07764034 Inf 0.597 0.5504
Note that I gave the type
specification in summary()
, not in contrast()
. For comparison, I copied-in your contrast_RR
function and ran it on em1
:
> contrast_RR(em1, interaction="pairwise")
g_pairwise k_pairwise RR SE df z.ratio p.value
0 - 1 0 - 1 0.9240704 0.4182265 Inf -1.588 0.1122
0 - 2 0 - 1 0.9659759 0.3764171 Inf -0.673 0.5011
1 - 2 0 - 1 1.0453488 0.4983492 Inf 0.825 0.4095
The estimates are identical, but the labeling is different, and the SEs and P values are wildly different (mine are correct, or at least defensible as being results of the delta method).
So, I think I have established that the results you want are obtainable using existing capabilities of emmeans, and that the key to doing so is to understand clearly that type
refers to how an emmGrid
object is summarized, not how it is constructed.
The remaining question is whether I should add a provision whereby the user can specify in their call to emmeans()
that contrasts constructed later be expressed as ratios. In doing so, I would have to set the summarization attribute in a way that it specifies how some subsequent object produced by contrast()
should be summarized. It is possible to do this through some devious coding, but to me this goes beyond simply providing a convenience for how the object itself should be summarized. I'll consider further, but right now I am not ready to add that feature -- primarily because documenting it would be very awkward and potentially confusing.
PS -- It also does not fail with other contrasts; e.g.,
> summary(contrast(em1log, interaction = "poly"), type = "response")
g_poly k_poly ratio SE df z.ratio p.value
linear linear 0.9659759 0.1038926 Inf -0.322 0.7476
quadratic linear 1.1312437 0.1640962 Inf 0.850 0.3953
Tests are performed on the log scale
Thanks Russell - this is definitely a better solution than my hacked together one!
Should be noted (for my own future reference) that here (as in all contrasts) the scale of the weights will produce different estimates of the RR (but same inferences):
(continuing your example):
> zero_vs_all.emmc <- function(x) {
+ data.frame(good = c(-2,1,1)/2, # gives the correct RR
+ bad = c(-2,1,1)) # gives an inflated RR (RR^2 in this case)
+ }
> em2 <- emmeans(fit, ~g)
NOTE: Results may be misleading due to involvement in interactions
> em2log <- regrid(em2, transform = "log")
> summary(contrast(em2log, "zero_vs_all"), type = "response")
contrast ratio SE df z.ratio p.value
good 1.687315 0.08353851 Inf 10.566 <.0001
bad 2.847031 0.28191153 Inf 10.566 <.0001
Results are averaged over the levels of: k
Tests are performed on the log scale
Gonna close this now; can always be reopened.
@rvlenth Thanks for your explanations on how to compute risk ratios for contrasts from logistic regression models. Does the same logic also apply to compute risk ratios for contrasts from ordinal logistic regression models estimated with MASS:polr()
?
So, if this (pseudo) codes reports odds ratios:
polr_model %>%
emmeans::emmeans(specs = the_formula, mode = "linear.predictor") %>%
emmeans::contrast(interaction = "pairwise") %>%
summary(type = "response", infer = TRUE, adjust = "tukey")
Will the following report risk ratios?
polr_model %>%
emmeans::emmeans(specs = the_formula, mode = "linear.predictor") %>%
emmeans::regrid("log") %>%
emmeans::contrast(interaction = "pairwise") %>%
summary(type = "response", infer = TRUE, adjust = "tukey")
It depends on what risks you want to compare. With mode = "linear.predictor"
, I'm not sure without looking it up. I would suggest the same but with, say, mode = "cum.prob"
.
BTW, the Tukey adjustment generally is not appropriate for interaction contrasts of more than one factor, and if you ask for it, it will be replaced with something else -- I think Sidak. I'm thinking you want method = "pairwise"
(for which it is unnecessary to specify adjust = "tukey"
because that is the default), not interaction = "pairwise"
.
Thanks so much for getting back to me, @rvlenth! Sorry for spamming this issue further, but just so I get this straight:
(a) for a logistic regression model with one categorical predictor variable with 3 levels (A, B, and C), when I use this (pseudo) code,
logit_model %>%
emmeans::emmeans(specs = the_formula) %>%
emmeans::regrid("log") %>%
emmeans::contrast(method = "pairwise") %>%
summary(type = "response", infer = TRUE)
the output's first row
contrast ratio ...
A / B 1.23 ...
A / C 1.87 ...
B / C 1.51 ...
gives the risk ratio = P(outcome variable = 1 | predictor variable = A) / P(outcome variable = 1 | predictor variable = B)
(b) for an ordinal logistic regression model with one categorical predictor variable with 3 levels (A, B, and C), when I use this (pseudo) code
polr_model %>%
emmeans::emmeans(specs = the_formula, mode = "cum.prob") %>%
emmeans::regrid("log") %>%
emmeans::contrast(method = "pairwise") %>%
summary(type = "response", infer = TRUE)
the output's first row gives the risk ratio = P(outcome variable <= j | predictor variable = A) / P(outcome variable <= j | predictor variable = B) with j = 1, ..., J-1 answer alternatives in the outcome variable
(c) when using mode = "exc.prob"
instead of mode = "cum.prob"
for the ordinal logistic regression model, the output's first row gives the risk ratio = P(outcome variable >= j | predictor variable = A) / P(outcome variable >= j | predictor variable = B).
Is this correct?
Close. For exc.prob
, you need to change ">=" to ">".
I'm going to eat some of my words... Compare:
> require(ordinal)
> m1 <- clm(rating ~ temp + contact, data = wine)
> # start with mode = "cum.prob":
> m1 %>%
+ emmeans::emmeans(specs = ~temp|cut, mode = "cum.prob") %>%
+ emmeans::regrid("log") %>%
+ emmeans::contrast(method = "pairwise") %>%
+ summary(type = "response", infer = TRUE, by=NULL)
contrast cut ratio SE df asymp.LCL asymp.UCL z.ratio p.value
cold / warm 1|2 10.21 5.1559 Inf 3.80 27.47 4.601 <.0001
cold / warm 2|3 4.31 1.5400 Inf 2.14 8.68 4.082 <.0001
cold / warm 3|4 1.70 0.2271 Inf 1.31 2.21 3.953 0.0001
cold / warm 4|5 1.19 0.0781 Inf 1.05 1.35 2.643 0.0082
Results are averaged over the levels of: contact
Confidence level used: 0.95
Intervals are back-transformed from the log scale
Tests are performed on the log scale
> # Start with mode = "linear.predictor"
> m1 %>%
+ emmeans::emmeans(specs = ~temp|cut, mode = "lin") %>%
+ emmeans::regrid("log") %>%
+ emmeans::contrast(method = "pairwise") %>%
+ summary(type = "response", infer = TRUE, by=NULL)
contrast cut ratio SE df asymp.LCL asymp.UCL z.ratio p.value
cold / warm 1|2 11.01 5.7013 Inf 3.99 30.38 4.630 <.0001
cold / warm 2|3 5.27 2.0730 Inf 2.44 11.39 4.226 <.0001
cold / warm 3|4 1.70 0.2590 Inf 1.27 2.30 3.511 0.0004
cold / warm 4|5 1.16 0.0713 Inf 1.03 1.31 2.400 0.0164
Results are averaged over the levels of: contact
Confidence level used: 0.95
Intervals are back-transformed from the log scale
Tests are performed on the log scale
Why are these different??? The reason is because of averaging over contact
. With mode = "cum.prob"
, we generated the reference grid and then put it through regrid()
to convert it to cumulative probabilities. So when we averaged over contact
, we were averaging the cumulative probabilities. With mode = "linear predictor"
, we did not regrid until later, so when we averaged over contact
, we did so on the linear-predictor scale. That, in my opinion, is the better choice, because averaging is a linear operation so it is best done with the linear predictor. Make sense?
Yes, that makes sense. Thank you so much for your help, @rvlenth!
I think it would be very useful if there was an option to have
emmeans
return the RR (risk ratio) for binomial models. RR = Pr(success | a) / Pr(success | b) or: RR = (1+exp(-ORb)) / (1+exp(-ORa))For predicted values (no contrast) this should return the same value as
type = 'response'
(prob
), and for contrasts the RR as stated above. It should look something like this:Thanks (again and again) for this amazing package!