rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
364 stars 32 forks source link

How to display p-values adjusted across multiple groups of comparisons with pwpp()? #472

Closed qdread closed 7 months ago

qdread commented 8 months ago

Hi Russ, I appreciate your great documentation on comparisons, in particular your answer on this stackexchange post about how to define the family for multiple comparisons correction differently than the groups within which the pairwise comparisons are done.

My question is how to interact this with a plot generated with pwpp(). I have two interacting factors and want to do pairwise comparisons of factor A levels within each level of factor B. But I want to use FDR correction across all comparisons done in this way. For example if there are 3 levels of factor A and 100 levels of factor B, that's 3 pairwise comparisons within each of 100 groups for a total of 300 comparisons.

I have no problem doing this to display in a table, as in this example:

library(emmeans)
data("Alfalfa", package = "nlme")

badmodel <- lm(Yield ~ Variety * Date, data = Alfalfa) # nlme data

emms <- emmeans(badmodel, ~ Variety | Date)

# These two things return the same result which is good
contrast(emms, method = 'pairwise') |> rbind(adjust = 'fdr')
contrast(emms, method = 'pairwise') |> summary(by = NULL, adjust = 'fdr')

However with a pwpp() plot it only does the FDR adjustment within each Date group which is correcting for fewer comparisons than I want it to. If I switch pwpp() to by = NULL this is also not correct because it now does every possible comparison of combinations of Variety and Date, instead of just the Variety comparisons within each Date.

pwpp(emms, method = 'pairwise', adjust = 'fdr') # Adjusts for wrong number of comparisons
pwpp(emms, method = 'pairwise', adjust = 'fdr', by = NULL) + facet_wrap(~ Date) # Does the wrong comparisons

Thanks in advance for your help!

rvlenth commented 8 months ago

I think as it is currently designed, pwpp() and pwpm cannot display a subset of all the comparisons it is capable of showing.

I suppose it is technically possible to do this, but I'm having trouble thinking of what kind of argument to provide. Thinking out loud, maybe it's possible to provide an object that already has the comparisons you want, such as

contrast(emms, method = 'pairwise') |> rbind(adjust = 'fdr') -> what.we.want

and maybe then have something like method = "existing" which requires a coef argument (or attribute), like

> coef(pairs(emms))
   Variety Date c.1 c.2 c.3 c.4 c.5 c.6 c.7 c.8 c.9 c.10 c.11 c.12
1  Cossack None   1   1   0   0   0   0   0   0   0    0    0    0
2    Ladak None  -1   0   1   0   0   0   0   0   0    0    0    0
3   Ranger None   0  -1  -1   0   0   0   0   0   0    0    0    0
4  Cossack   S1   0   0   0   1   1   0   0   0   0    0    0    0
5    Ladak   S1   0   0   0  -1   0   1   0   0   0    0    0    0
6   Ranger   S1   0   0   0   0  -1  -1   0   0   0    0    0    0
7  Cossack  S20   0   0   0   0   0   0   1   1   0    0    0    0
8    Ladak  S20   0   0   0   0   0   0  -1   0   1    0    0    0
9   Ranger  S20   0   0   0   0   0   0   0  -1  -1    0    0    0
10 Cossack   O7   0   0   0   0   0   0   0   0   0    1    1    0
11   Ladak   O7   0   0   0   0   0   0   0   0   0   -1    0    1
12  Ranger   O7   0   0   0   0   0   0   0   0   0    0   -1   -1

But this is pretty shaky because we can't get the coefs directly:

> coef(what.we.want)
No contrast coefficients are available
NULL

The coefs are nulled-out when we do the rbind()

Let me ponder this for a while.

rvlenth commented 8 months ago

I figured out how to do what you want. Note that the result of`coefs() above is a data frame, so let

coefs = coefs(pairs(emms))

if we exclude the first two columns, we have a named list of contrast coefficients. Thus, just use them as the contrast method (we have to null-out the by grouping first):

pwpp(emms, coefs[-(1:2)], by = NULL, adjust = "fdr")

image

rvlenth commented 8 months ago

I have a less-hacky way to do this. Note that we can get the combined groups by using the options argument to contrast().

contrast(emms, "pairwise", options = list(by = NULL, adjust = "fdr"))

So, one would hope that

pwpp(emms, "pairwise", options = list(by = NULL, adjust = "fdr"))

would work right. However, it doesn't, because by is an actual argument to pwpp() and is used in its code. However, (and luckily), the first thing it does is gets the contrasts. So I added a line of code to change by to whatever it is after that call. And now it works! You may wait till the next update to CRAN, or wait until I push it up to GitHub and then install it from here.

qdread commented 8 months ago

Hi Russ this is excellent. I would be curious if there is a way to get the pwpp() plot with facets corresponding to the by groups. Both of the following give less than optimal results, using the example data from before.

pwpp(emms, "pairwise", options = list(by = NULL, adjust = "fdr")) + facet_wrap(~ Date)
pwpp(emms, "pairwise", options = list(by = NULL, adjust = "fdr")) + facet_wrap(~ Date, scales = 'free_y')
rvlenth commented 8 months ago

Well, I see that the free_y has some issues with rendering, but the first one is pretty much exactly what I expected. What to you is less than optimal? Just the fact that all the labels occur in each panel?

I believe that the rendering issues in the second one are due to the fact that the line segments are rendered in two parts with a color change halfway between, and the midpoints of those segments don't get included in the grouping as far as scaling is concerned. Offhand, I am not sure that can be fixed, just because in general, the midpoints can be associated with more than one level of a given factor.

qdread commented 8 months ago

Yes, it was just that the extra labels occur in each panel but it's not a huge issue. Thanks so much for working on it!

rvlenth commented 7 months ago

The issue with scales = 'free_y' messing things up has to do with the fact that the colors of the comparison bars change color at their midpoints. The idea is that the color at each end is the same as the color of the other mean, supposedly making it easier to identify which mean is being compared. That creates a need for a new grouping factor for use by the ggplot code. I don't see a way that I can disentangle those groupings. so I think I'm stuck with leaving things as they are.