rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
358 stars 31 forks source link

Multivariate contrasts? #281

Closed mattansb closed 2 years ago

mattansb commented 3 years ago

Hi Russell,

I was wondering if there was a way to obtain multivariate contrasts from a multivariate model? Or can I only get separate univariate contrasts for each response variate?

rvlenth commented 3 years ago

Yes. The multivariate levels are just presented as the levels of a factor named 'rep.meas' (you can rename that via the 'mult.name' argument. See help for 'ref.grid'

mattansb commented 3 years ago
data("mtcars")

I’m trying to do this in emmeans:

# Make dummy coding
mm <- model.matrix( ~ factor(cyl), data = mtcars)
head(mm)
#>                   (Intercept) factor(cyl)6 factor(cyl)8
#> Mazda RX4                   1            1            0
#> Mazda RX4 Wag               1            1            0
#> Datsun 710                  1            0            0
#> Hornet 4 Drive              1            1            0
#> Hornet Sportabout           1            0            1
#> Valiant                     1            1            0

m2 <- lm(cbind(mpg, hp) ~ mm[, 2] + mm[, 3], data = mtcars)
car::Anova(m2) # <<<<
#> 
#> Type II MANOVA Tests: Pillai test statistic
#>         Df test stat approx F num Df den Df    Pr(>F)    
#> mm[, 2]  1   0.41552    9.953      2     28 0.0005429 ***
#> mm[, 3]  1   0.79922   55.729      2     28  1.73e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This 👆 tests the multivariate distance between two means.

Trying with emmeans:

m <- lm(cbind(mpg, hp) ~ factor(cyl), data = mtcars)

library(emmeans)

This give univariate contrasts collapsed (averaged) across both responses. Not what I want.

emmeans(m, trt.vs.ctrl1 ~ cyl)[[2]]
#>  contrast estimate   SE df t.ratio p.value
#>  6 - 4        16.4 8.98 29 1.823   0.1428 
#>  8 - 4        57.5 7.48 29 7.687   <.0001 
#> 
#> Results are averaged over the levels of: rep.meas 
#> P value adjustment: dunnettx method for 2 tests

This seems closer? But gives different results compared to car (different F's and dfs):

emmeans(m, trt.vs.ctrl1 ~ cyl | rep.meas)[[2]] |> 
  joint_tests(by = "contrast")
#> contrast = 6 - 4:
#>  model term df1 df2 F.ratio p.value
#>  rep.meas     1  29   6.114 0.0195 
#> 
#> contrast = 8 - 4:
#>  model term df1 df2 F.ratio p.value
#>  rep.meas     1  29  77.473 <.0001

Created on 2021-06-28 by the reprex package (v2.0.0)

rvlenth commented 3 years ago

I misunderstood the original question. No, the emmeans package does not provide Mahalanobis distance comparisons.

mattansb commented 3 years ago

Thanks for clarifying!

mattansb commented 3 years ago

Just in case anyone finds this tread going down the same rabbit hole I did...

Following this solution by Yves Rosseel, I've managed the following solution:

mlh <- function(fit, specs, em_args = list(), c_args = list(), wide_EM = TRUE) {
  #' @param fit a mlm model
  #' @param specs A formula specifying the predictors over which EMMs are
  #'   desired.
  #' @param em_args Passed to emmeans::emmeans
  #' @param c_args Passed to emmeans::contrast
  #' @param wide_EM Should the EMs be returned in the wide format?
  stopifnot("specs must be a formula" = inherits(specs, "formula"),
            "object must be of class \"manova\" or \"maov\"" = inherits(fit, "mlm"),
            requireNamespace("emmeans"),
            requireNamespace("tidyr"),
            requireNamespace("insight"))

  ## EMMEANS ----
  em_args <- c(list(object = fit, 
                    specs = specs,
                    data = insight::get_data(fit)), 
               em_args)
  EM <- do.call(emmeans::emmeans, em_args)

  ## EMMEANS for plotting ----
  em_args$specs <- update(em_args$specs, ~ . + rep.meas)
  EM_out <- do.call(emmeans::emmeans, em_args)
  if (isTRUE(wide_EM)) {
    EM_out <- as.data.frame(EM_out)
    EM_out <- tidyr::pivot_wider(
      EM_out, 
      names_from = "rep.meas",
      values_from = -seq_len(which(colnames(EM_out) == "rep.meas"))
    )
    EM_out <- as.data.frame(EM_out)
  }

  if (length(c_args) == 0L) return(EM_out)

  ## Contrasts ----
  X <- as.matrix(model.matrix(fit))
  B <- as.matrix(fit$coef)
  M <- diag(ncol(B))
  rss.qr <- qr(crossprod(fit$residuals %*% M))

  # Get L
  c_args <- c(list(object = EM), c_args)
  EM <- do.call(emmeans::contrast, c_args)
  LL <- EM@linfct[, seq_len(nrow(B)),drop = FALSE]

  # https://stat.ethz.ch/pipermail/r-help/2004-June/052051.html

  # For each L
  RES <- lapply(seq_len(nrow(LL)), function (i) {
    L <- rbind(LL[i,]) # for each contrast

    H <- t(M) %*% t(L%*%B)  %*%
      as.matrix(solve(L%*%solve(t(X)%*%X)%*%t(L)))  %*%
      (L%*%B)%*%M

    eig <- Re(eigen(qr.coef(rss.qr, H), symmetric = FALSE)$values)

    q <- nrow(L); df.res <- fit$df.residual

    test <- prod(1 / (1 + eig))
    p <- length(eig)
    tmp1 <- df.res - 0.5 * (p - q + 1)
    tmp2 <- (p * q - 2) / 4
    tmp3 <- p ^ 2 + q ^ 2 - 5
    tmp3 <-  if (tmp3 > 0) sqrt(((p * q) ^ 2 - 4) / tmp3) else 1

    wilks   <- test
    F.ratio <- ((test ^ (-1 / tmp3) - 1) * (tmp1 * tmp3 - 2 * tmp2)) / p / q
    df1     <- p * q
    df2     <- tmp1 * tmp3 - 2 * tmp2
    p.value <- pf(F.ratio, df1, df2, lower.tail = FALSE)
    data.frame(wilks, F.ratio, df1, df2, p.value)  
  })
  RES <- do.call("rbind", RES)

  list(EM = EM_out,
       C = cbind(EM@grid,RES))
}

m <- lm(cbind(mpg, hp) ~ factor(cyl), data = mtcars)

mlh(m, ~ cyl, c_args = list(method = "trt.vs.ctrl1"))
#> Loading required namespace: emmeans
#> Loading required namespace: tidyr
#> Loading required namespace: insight
#> $EM
#>   cyl emmean_mpg emmean_hp    SE_mpg    SE_hp df_mpg df_hp lower.CL_mpg
#> 1   4   26.66364  82.63636 0.9718008 11.43283     29    29     24.67608
#> 2   6   19.74286 122.28571 1.2182168 14.33181     29    29     17.25132
#> 3   8   15.10000 209.21429 0.8614094 10.13412     29    29     13.33822
#>   lower.CL_hp upper.CL_mpg upper.CL_hp
#> 1    59.25361     28.65119    106.0191
#> 2    92.97388     22.23439    151.5975
#> 3   188.48769     16.86178    229.9409
#> 
#> $C
#>   contrast     wilks   F.ratio df1 df2      p.value
#> 1    6 - 4 0.5844773  9.953025   2  28 5.429344e-04
#> 2    8 - 4 0.2007777 55.728861   2  28 1.729882e-10

Created on 2021-06-29 by the reprex package (v2.0.0)

rvlenth commented 3 years ago

Seems like you don't have to go all the way back to the residuals of the model to get the error quantities, because all the needed covariances are available via vcov().

> RG = ref_grid(m)
> CON = contrast(RG, "trt.vs.ctrl1", simple = "cyl")
> summary(CON, by = NULL)
 contrast rep.meas estimate    SE df t.ratio p.value
 6 - 4    mpg         -6.92  1.56 29 -4.441  0.0004 
 8 - 6    mpg         -4.64  1.49 29 -3.112  0.0149 
 6 - 4    hp          39.65 18.33 29  2.163  0.1250 
 8 - 6    hp          86.93 17.55 29  4.952  0.0001 

> con13 = CON[c(1,3)]  # Results for contrast '6 - 4'
> t(predict(con13)) %*% solve(vcov(con13), predict(con13)) / 2 
         [,1]
[1,] 10.30849

> con24 = CON[c(2,4)]  # Results for contrast '8 - 4'
> t(predict(con24)) %*% solve(vcov(con24), predict(con24)) / 2 
         [,1]
[1,] 57.71918

The above are Wald F statistics, which differ somewhat from what you obtained, but are in the same ballpark.

Addendum

Just a simple d.f. adjustment makes the results match:

> t(predict(con13)) %*% solve(vcov(con13), predict(con13)) / 2 * (28 / 29)
         [,1]
[1,] 9.953025
> t(predict(con24)) %*% solve(vcov(con24), predict(con24)) / 2 * (28 / 29)
         [,1]
[1,] 55.72886
rvlenth commented 3 years ago

It seems that this does the trick:

mvcontrast = function(object, method, mult.name, by = object@misc$by.vars, name = "contrast", ...) {
    by = union(by, mult.name)
    con = contrast(object, method = method, by = by, name = name)
    by = union(setdiff(by, mult.name), name)
    con = contrast(con, "identity", by = by, name = "iden.") # just re-orders it
    ese = emmeans:::.est.se.df(con)
    est = ese$est
    df = ese$df
    V = vcov(con)
    rows = emmeans:::.find.by.rows(con@grid, by)
    result = lapply(rows, function(r) {
        df1 = length(r)
        rawdf = mean(df[r])
        df2 = rawdf - df1 + 1
        F = t(est[r]) %*% solve(V[r, r], est[r]) / df1 * (df2 / rawdf)
        data.frame(df1 = df1, df2 = df2, F.ratio = F)
    })
    result = cbind(con@grid[sapply(rows, function(r) r[1]), ], do.call(rbind, result))
    result[["iden."]] = NULL
    result$p.value = with(result, pf(F.ratio, df1, df2, lower.tail = FALSE))
    class(result) = c("summary_emm", "data.frame")
    attr(result, "estName") = "F.ratio"
    attr(result, "by.vars") = if(length(by) == 1) NULL else setdiff(by, name)
    result
}

Test run:

> EMM
rep.meas = mpg:
 cyl emmean     SE df lower.CL upper.CL
   4   26.7  0.972 29     24.7     28.7
   6   19.7  1.218 29     17.3     22.2
   8   15.1  0.861 29     13.3     16.9

rep.meas = hp:
 cyl emmean     SE df lower.CL upper.CL
   4   82.6 11.433 29     59.3    106.0
   6  122.3 14.332 29     93.0    151.6
   8  209.2 10.134 29    188.5    229.9

Confidence level used: 0.95 

> mvcontrast(EMM, "pairwise", mult.name = "rep.meas")
 contrast df1 df2  F.ratio p.value
 4 - 6      2  28  9.95303 0.0005 
 4 - 8      2  28 55.72886 <.0001 
 6 - 8      2  28 13.37688 0.0001 

BTW, for awhile it looked like the df should be more complicated than I had, but in fact, for an individual contrast, we have q = 1 and p equals the dimension of the multivariate response, so in fact the formulas in your code simplify to df1 = p and df2 = df.residual - p + 1. This is confirmed by noting that we really have a Hotelling's T^2 statistic, and we can find this relation to the F distribution, including these d.f., in his 1931 Annals paper. When we have q > 1, there are more complications, and that's when Wilks, Pillai, Roy, Hotelling-Lawley start giving different approximations.

rvlenth commented 3 years ago

FWIW I have added a slightly enhanced version of this mvcontrast() function to the emmeans package.

mattansb commented 3 years ago

That's great!

Playing around with it, is doesn't seem to support interaction contrasts?

m <- lm(cbind(mpg, hp) ~ factor(cyl) * am, data = mtcars)

emMV1 <- emmeans::emmeans(m, ~ cyl | rep.meas)
emmeans::mvcontrast(emMV1, method = "trt.vs.ctrl1")
#>  contrast df1 df2  F.ratio p.value
#>  6 - 4      2  25  6.99842 0.0039 
#>  8 - 4      2  25 59.26152 <.0001

emMV2 <- emmeans::emmeans(m, ~ cyl + am | rep.meas)
emmeans::mvcontrast(emMV2, interaction = list(cyl = "pairwise", am = "pairwise"))
#> Error in contrast(object, method = method, by = by, name = name): argument "method" is missing, with no default

Created on 2021-07-01 by the reprex package (v2.0.0)

rvlenth commented 3 years ago

Good point. I think that can be added. I also need to provide for adjusted P values. I think the only viable possibilities are sidak and those in p.adjust.methods.

BTW, based on the derivation from Hotelling's T^2 as opposed to the MLH, it is clear that you can specify anything for mult.name, doesn't need to have originally been a multivariate response.

Russ

mattansb commented 3 years ago

BTW, based on the derivation from Hotelling's T^2 as opposed to the MLH, it is clear that you can specify anything for mult.name, doesn't need to have originally been a multivariate response.

Yes, I saw and understood it was possible, but I'm not sure I understand what use that could have other than a multivariate response?

rvlenth commented 3 years ago

For example, in the MOats dataset, compare nitro levels WITHOUT averaging over Variety, just averaging over Block. We have three Variety means with each nitro level, how do those triples compare?

rvlenth commented 3 years ago

Guess I should re-open this issue for awhile...

I had to work harder than I'd thought to get interaction contrasts to work, because of the way I depended on the name argument (which is now gone altogether). But I think I have things in order now, including adding the P-value adjustment which defaults to Sidak:

> mvcontrast(emMV2, interaction = list(cyl = "pairwise", am = "pairwise"))
 cyl_pairwise am_pairwise df1 df2  F.ratio p.value
 4 - 6        0 - 1         2  25 0.703701 0.8782 
 4 - 8        0 - 1         2  25 6.331042 0.0178 
 6 - 8        0 - 1         2  25 4.407533 0.0672 

P value adjustment: sidak 

> mvcontrast(emMV2, interaction = list(cyl = "pairwise", am = "pairwise"), adjust = "none")
 cyl_pairwise am_pairwise df1 df2  F.ratio p.value
 4 - 6        0 - 1         2  25 0.703701 0.5043 
 4 - 8        0 - 1         2  25 6.331042 0.0060 
 6 - 8        0 - 1         2  25 4.407533 0.0229 
rvlenth commented 3 years ago

Example of multivariate comparisons of non-multivariate response...

> example(mvcontrast)
mvcntr> MOats.lm <- lm(yield ~ Variety + Block, data = MOats)
mvcntr> MOats.emm <- emmeans(MOats.lm, ~ Variety | rep.meas)
    . . .

> emmip(MOats.emm, rep.meas ~ Variety)

image

I want to compare the four curves in this plot --- that is a comparison of the levels of rep.meas (amount of nitrogen) without averaging over Variety.

> mvcontrast(MOats.emm, "consec", by = NULL, mult.name = "Variety")
 contrast  df1 df2  F.ratio p.value
 0.2 - 0     3   8 5.732918 0.0634 
 0.4 - 0.2   3   8 1.991854 0.4761 
 0.6 - 0.4   3   8 0.901595 0.8607 

P value adjustment: sidak 
mattansb commented 3 years ago

This is really interesting. So if I understand correctly…

library(emmeans)

MOats.lm <- lm(yield ~ Variety + Block, data = MOats)

MOats.emm <- emmeans(MOats.lm, ~ Variety | rep.meas)

emmip(MOats.emm, rep.meas ~ Variety)

So there are different types of questions we can ask about the differences between each line.

1. Is these a mean difference between the lines:

emmeans(MOats.emm, ~ rep.meas) |>
  contrast("consec")
#>  contrast  estimate   SE df t.ratio p.value
#>  0.2 - 0      19.50 4.22 10 4.620   0.0028 
#>  0.4 - 0.2    15.33 5.92 10 2.591   0.0605 
#>  0.6 - 0.4     9.17 5.02 10 1.826   0.2038 
#> 
#> Results are averaged over the levels of: Block, Variety 
#> P value adjustment: mvt method for 3 tests

2. Is there a difference between the differences in the lines?

contrast(MOats.emm, interaction = list(rep.meas = "consec", Variety = "pairwise"),
         by = NULL) |>
  joint_tests(by = "rep.meas_consec")
#> rep.meas_consec = 0.2 - 0:
#>  model term       df1 df2 F.ratio p.value
#>  Variety_pairwise   2  10   0.077 0.9265 
#> 
#> rep.meas_consec = 0.4 - 0.2:
#>  model term       df1 df2 F.ratio p.value
#>  Variety_pairwise   2  10   0.377 0.6953 
#> 
#> rep.meas_consec = 0.6 - 0.4:
#>  model term       df1 df2 F.ratio p.value
#>  Variety_pairwise   2  10   0.023 0.9772

3. Is there a sig distance between the lines?

mvcontrast(MOats.emm, "consec", by = NULL, mult.name = "Variety")
#>  contrast  df1 df2  F.ratio p.value
#>  0.2 - 0     3   8 5.732918 0.0216 
#>  0.4 - 0.2   3   8 1.991854 0.1939 
#>  0.6 - 0.4   3   8 0.901595 0.4817

Created on 2021-07-02 by the reprex package (v2.0.0)

rvlenth commented 3 years ago

Note the joint_tests() function constructs contrasts for each factor. In this example, differences of Variety differences are still differences of Variety levels, so it doesn't really change anything; but I would do the second one as

> contrast(MOats.emm, "consec", by = "Variety") |> joint_tests(by = "contrast")
contrast = 0.2 - 0:
 model term df1 df2 F.ratio p.value
 Variety      2  10   0.077 0.9265 

contrast = 0.4 - 0.2:
 model term df1 df2 F.ratio p.value
 Variety      2  10   0.377 0.6953 

contrast = 0.6 - 0.4:
 model term df1 df2 F.ratio p.value
 Variety      2  10   0.023 0.9772
rvlenth commented 3 years ago

Then there's this:

> contrast(MOats.emm, "consec", interaction = "consec", by = NULL) |> 
+     mvcontrast("identity", mult = "Variety_consec", adjust = "none")
 contrast    df1 df2   F.ratio p.value
 (0.2 - 0)     2   9 0.0692308 0.9336 
 (0.4 - 0.2)   2   9 0.3391960 0.7211 
 (0.6 - 0.4)   2   9 0.0208349 0.9794 

Almost the same F ratios and P values (in this example), but different d.f. Trying to understand the distinction...

mattansb commented 3 years ago

The best I can do is we are answering the same question in both methods with different lenses (I would argue that the mvcontrast lens makes little sense?)

C <- contrast(MOats.emm, "consec", interaction = "consec", by = NULL)

emmip(C, rep.meas_consec ~ Variety_consec)

Is each line (both contrasts) different from the y = 0 line?

Since each “point” is an interaction contrasts, in the joint_test() we're asking about both interaction contrasts stratified by the difference between consecutive levels of rep.meas.

joint_tests(C, by = "rep.meas_consec")
#> rep.meas_consec = 0.2 - 0:
#>  model term     df1 df2 F.ratio p.value
#>  Variety_consec   1  10   0.153 0.7041 
#> 
#> rep.meas_consec = 0.4 - 0.2:
#>  model term     df1 df2 F.ratio p.value
#>  Variety_consec   1  10   0.635 0.4441 
#> 
#> rep.meas_consec = 0.6 - 0.4:
#>  model term     df1 df2 F.ratio p.value
#>  Variety_consec   1  10   0.005 0.9452

In a MV framework, we’re asking if each point in this plot is significantly different from (0,0):

summary(C) |>
  tidyr::pivot_wider(names_from = "Variety_consec", values_from = estimate:p.value) |> 
  ggplot2::ggplot(ggplot2::aes(`estimate_Marvellous - Golden Rain`, `estimate_Victory - Marvellous`,
                               color = rep.meas_consec)) + 
  ggplot2::geom_point() + 
  ggplot2::geom_point(x = 0, y = 0, size = 4, color = "black", shape = 3)

mvcontrast(C, "identity", mult = "Variety_consec", adjust = "none")
#>  contrast    df1 df2   F.ratio p.value
#>  (0.2 - 0)     2   9 0.0692308 0.9336 
#>  (0.4 - 0.2)   2   9 0.3391960 0.7211 
#>  (0.6 - 0.4)   2   9 0.0208349 0.9794

This hurts my brain........

mattansb commented 3 years ago

Would it be possible to add the Mahalanobis distance of the contrast to the output as the estimate? Or do you think that's overkill?

rvlenth commented 3 years ago

Well, comparing these two, the mvcontrast() test makes more sense to me because it looks at the 2 d.f. in Variety effects.

Your joint tests appoach only looks at the slopes of those three lines: (Variety 3 - Variety 2) - (Variety 2 - Variety 1) = (Variety 3 - Varirety 1), (a 1 d.f. comparison), which isn't obvious at all looking at the plot since it is already in terms of contrasts. Note that this could have been done as

> pairs(C, by = "rep.meas_consec")
rep.meas_consec = 0.2 - 0:
 contrast                                            estimate   SE df t.ratio p.value
 (Marvellous - Golden Rain) - (Victory - Marvellous)      7.0 17.9 10  0.391  0.7041 

rep.meas_consec = 0.4 - 0.2:
 contrast                                            estimate   SE df t.ratio p.value
 (Marvellous - Golden Rain) - (Victory - Marvellous)    -20.0 25.1 10 -0.797  0.4441 

rep.meas_consec = 0.6 - 0.4:
 contrast                                            estimate   SE df t.ratio p.value
 (Marvellous - Golden Rain) - (Victory - Marvellous)      1.5 21.3 10  0.070  0.9452 

Results are averaged over the levels of: Block 
rvlenth commented 3 years ago

Mahal. distance could be added. It's just the square root of a multiple of the F ratio. Are those typically reported in papers?

rvlenth commented 3 years ago

How's this?

> example(mvcontrast)
mvcntr> MOats.lm <- lm(yield ~ Variety + Block, data = MOats)
mvcntr> MOats.emm <- emmeans(MOats.lm, ~ Variety | rep.meas)

mvcntr> mvcontrast(MOats.emm, "consec", show.ests = TRUE)
$estimates
    ...

$tests
 contrast                  Mahal.d df1 df2   F.ratio p.value
 Marvellous - Golden Rain 1.755526   4   7 0.5393277 0.9174 
 Victory - Marvellous     3.029957   4   7 1.6066123 0.4726 

P value adjustment: sidak 

mvcntr> # Test each mean against a specified null vector
mvcntr> mvcontrast(MOats.emm, "identity", name = "Variety", 
mvcntr+            null = c(80, 100, 120, 140), adjust = "none")
 Variety      Mahal.d df1 df2  F.ratio p.value
 Golden Rain 3.162364   4   7 1.750096 0.2430 
 Marvellous  5.160188   4   7 4.659820 0.0377 
 Victory     3.198767   4   7 1.790619 0.2352 
mattansb commented 3 years ago

Mahal. distance could be added. [...] Are those typically reported in papers?

I honestly have no idea... (: (But if so, I guess SE and CI would be in order :P)


Just to make sure, I can report these contrasts as Hotelling's T^2, right?

library(emmeans)

MOats.lm <- lm(yield ~ Variety + Block, data = MOats)

MOats.emm <- emmeans(MOats.lm, ~ Variety | rep.meas)

mvcontrast(MOats.emm, "consec") |>
  transform(T2 = F.ratio * (df1 * (df2 + df1 - 1)) / (df2),
            F.ratio = NULL,
            df2 = df2 + df1 - 1)
#>                   contrast Mahal.dist df1 df2   p.value       T2
#> 1 Marvellous - Golden Rain   1.755526   4  10 0.9173667 3.081872
#> 5     Victory - Marvellous   3.029957   4  10 0.4725502 9.180642

Created on 2021-07-04 by the reprex package (v2.0.0)

mattansb commented 3 years ago

Oh sorry, I meant the distance, not Mahal. distance. But I suspect this can be dropped anyway - I seen neither are really reported (and can be extracted from show.ests = TRUE, I think.)

rvlenth commented 3 years ago

Note that your T2 values are just Mahal.dist^2. So maybe I should just report T2 rather than Mahal.dist.

I don't want to mess with Euclidean distance...

mattansb commented 3 years ago

Fair enough (:

rvlenth commented 3 years ago

OK, the new update makes this a lot more robust. Previously, it would go belly-up if any of the factors involved do not interact -- because, if you think about it, that reduces the rank of the covariance matrix. We now accommodate reduced-rank situations. There's also some new vignette material (actually in the one on interactions, not the one on contrasts though it is linked there). One of the vignette examples illustrates the reduced-rank situation and demonstrates that in the extreme, it yields the equivalent of the univariate analysis produced by contrast().

The main reason I'm not adding Euclidean distance, BTW, is that "someone" will then want me to add SEs or CIs for them, and I don't want to think about that...

mattansb commented 3 years ago

"someone" hehe, fair enough!

As always, this is awesome! Thank you so so much! I really needed this for a project, and couldn't believe that it didn't exist (not that I could find, other than that old post of the R list). Thanks!

mattansb commented 3 years ago

In the vignette you write:

While all comparisons are "significant," the T.square values indicate that small and medium cars are much closer together, and larger ones are more separate -- as can be visualized in the earlier plot.

It is my understanding that T2 in a univariate case is equivalent to a Student's t - which shouldn't be used as a measure of effect size. Is this not true for T2?

rvlenth commented 3 years ago

Point taken. It kind of us in this case, as the design is balanced. But I'll try to write that more carefully (or just take it out).

Russ

mattansb commented 3 years ago

The output from mvcontrast() is an emm_summary object, which cannot be passed to joint_test(). Would that be something you might consider adding - the ability to joint test some mv-contrasts?

rvlenth commented 3 years ago

It's actually a summary_emm object. But joint_tests() would not be able to handle it in any case, because joint_tests() creates contrasts of its input object, and that requires it to be an emmGrid or something that can be converted to one.

It also is not what I think you want; but rather I think you want > 1 d.f. multivariate tests of all those multivariate contrasts together. I'm pretty sure that cannot be done using the set of T^2 results, because we don't have the covariance information between those. I am also not sure how to do that, other than to go back to an adaptatio0n of your mlh() function that allows for a set of contrasts.

rvlenth commented 3 years ago

BTW, version 1.6.2 (actually 1.6.2-1 due to a needed minor tweak) just hit CRAN, and irt incorporates the mvcontrast() function.

mattansb commented 3 years ago

It also is not what I think you want; but rather I think you want > 1 d.f. multivariate tests of all those multivariate contrasts together. I'm pretty sure that cannot be done using the set of T^2 results, because we don't have the covariance information between those.

Yes to both (looking as the mvcontrast() I can see that this would not be possible without some major overhauling... Nobody wants that).

As far as I am concerned, this issue can be closed.

Thanks again! Have a great summer!

rvlenth commented 3 years ago

Thanks. If the mv joint test can be done using the info in an emmGrid object, it should be possible to implement it via an additional argument mvcontrast(..., joint = TRUE) (defaults to FALSE) similar to in test.emmGrid() . I'll keep that in the back of my mind in case I have any sudden insights (or more pertinent, sudden gains in knowledge!)

I'll keep this open for a while more, just to nag myself...

rvlenth commented 3 years ago

Hmmmm. Well, this issue is still open, so I just looked at it again. It goes on and on! I still don't feel like I am ready to wade into doing joint tests of multivariate contrasts ... but at least I thought about it for a few seconds.

rvlenth commented 2 years ago

I've decided to close this issue. I'm realizing that I'm likely never to do this. The addition I did make, which is valuable I think, provides post hoc Hotelling $T^2$ tests, e.g. pairwise comparisons of multivariate means.

Adding corresponding joint tests is fairly peripheral to the central goal of the emmeans package, which is to provide estimates of descriptive quantities of estimates and effects from a model. Joint tests are not descriptive of the nature of the effects, and are most useful in the context of model selection. And multivariate joint tests open up a whole new Pandora's box of which statistic to use (Pillai's trace, etc.) While these things are likely tractable, it would involve the addition of a considerable amount of code that would seldom be useful.