Closed mattansb closed 4 years ago
Alright, the difference is due to lavaan
accounting for uncertainty in estimating the mean of Petal.Length
, whereas emmeans
just "plugs" the mean as a "pick a point", and so does not account for uncertainty.
I think then than this is issue is basically resolved. Do you think semTools
is a good fit? If not I can approach the maintainer of emmeans
or wrap this in something standalone.
Nice work! Good idea, augmenting VCOV to include fixed parameters.
Alright, the difference is due to
lavaan
accounting for uncertainty in estimating the mean ofPetal.Length
, whereasemmeans
just "plugs" the mean as a "pick a point", and so does not account for uncertainty.
Yes, by defining the mean in the syntax, you may have noticed lavaan's warning:
switching to fixed.x = FALSE
which makes it inconsistent with the approach assumed by emmeans
(i.e., fixed.x = TRUE
).
Do you think
semTools
is a good fit? If not I can approach the maintainer ofemmeans
or wrap this in something standalone.
Perhaps ask @rvlenth about including it in emmeans
, which might be the simplest solution. But it could make sense to have a lavaan.emmeans
package devoted to linking the 2 packages (like lavaan.survey
links lavaan
with survey
).
But I would hesitate to make this public before more extensive testing in a variety of scenarios. For example, your code calls lavPredict()
to generate factor scores, relying the default method for the fitted model. Not sure different methods would affect results, since factor scores seem only to serve the purpose of constructing a fake lm()
fit to automatically generate the list format with terms, etc., expected by emmeans
. But missing data could affect how many rows are returned (depending on the method and whether it can return factor scores with incomplete data). Returning to some original concerns in the lavaan thread: Can your code already accommodate multiple outcomes to treat as repeated measures (e.g., as a vector passed to lavaan.DV=
)? Does it break in multigroup (or multilevel) models? Beyond merely "working", can the grouping variable be treated as an exogenous predictor? What happens (and what should happen) when a lavaan model is fitted with conditional.x=TRUE
(i.e., exogenous predictors are partialed out of the endogenous variables, so the model is fitted to a residual covariance matrix)? What happens when the model includes ordered
outcomes, for which intercepts are fixed to zero by default in favor of estimating thresholds instead?
You don't necessarily need to provide functionality for all of these scenarios, but it would probably be wise to check for them and provide informative error messages saying what is not (yet) available. And I don't want to sound discouraging. This is a great contribution, and I would love to see it implemented!
Answering some of your questions:
As for these:
conditional.x=TRUE
?If you could provide some working examples, I can play around with them.
I think a standalone pkg might be much, as it only has 2 user-facing function. @rvlenth, what do you think about adding these to emmeans
? If not, I think semTools
would be the best fit.
You don't need to create a lavaan.emmeans package. Just include the recover_data
and emm_basis
methods in your package code (not exported), then add a package-loading provision to register them (typically in the file zzz.R
):
.onLoad = function(libname, pkgname) {
if (requireNamespace("emmeans", quietly = TRUE))
emmeans::.emm_register("lavaan", pkgname)
}
This is better than having me incorporate lavaan support in emmeans, because you know the fine details of what's needed to support those objects, and can change the methods as needed when capabilities are changed.
Sounds like the best idea is to include it in semTools (for lavaan.mi
objects at least, and for lavaan
if Yves doesn't want to include it). But the big issues should be resolved first:
lavInspect(fit, "ngroups")
)lavInspect(fit, "nlevels")
and return an error message when > 1L
. It would be too difficult to try reproducing Level-2 "data".ordered
variables with lavNames(fit, "ov.ord")
, and just return an error if lavaan.DV
is one of those variables. Or simply verify that lavaan.DV %in% lavNames(fit, "ov.num")
Regarding the additional scenarios, here is an example of 2 latent factors that are repeated measures, where scalar invariance constraints allow the Time-2 intercept to be estimated. That is the mean difference (conditional on the moderator ind60 == 0
, which is its mean) because the Time-1 intercept is fixed to zero. I create user-defined parameters to estimate mean differences across time conditional on 1 SD above/below the mean of ind60
(its SD is fixed to 1):
model <- '
ind60 =~ x1 + x2 + x3
# metric invariance
dem60 =~ y1 + a*y2 + b*y3 + c*y4
dem65 =~ y5 + a*y6 + b*y7 + c*y8
# scalar invariance
y1 + y5 ~ d*1
y2 + y6 ~ e*1
y3 + y7 ~ f*1
y4 + y8 ~ g*1
# regressions (slopes differ: interaction with time)
dem60 ~ b1*ind60
dem65 ~ b2*ind60 + NA*1 + Mean.Diff*1
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
# conditional mean differences (besides mean(ind60) == 0)
low := (-1*b2 + Mean.Diff) - (-1*b1) # 1 SD below M
high := (b2 + Mean.Diff) - b1 # 1 SD above M
'
fit <- sem(model, data = PoliticalDemocracy)
summary(fit)
Perhaps if lavaan.DV=
accepted a vector with length > 1, this might not be difficult to do in emmeans. Each outcome has its own set of coefficients, just like a mlm
object.
To treat the group=
variable as a predictor, recognize that each group gets its own coefficients for the same formula per outcome. An equivalent way to specify a "multigroup" approach in OLS regression is to omit the intercept so a dummy is included for all groups (i.e., no reference group needed). This means each group gets its own intercept, and its own slope if you omit the lower-order effect and only include the interaction between group and covariate. Compare the coefficients in these examples:
model <- 'x1 ~ c(int1, int2)*1 + c(b1, b2)*ageyr
diff_11 := (int2 + b2*11) - (int1 + b1*11)
diff_13 := (int2 + b2*13) - (int1 + b1*13)
diff_15 := (int2 + b2*15) - (int1 + b1*15)
'
summary(fit <- sem(model, group = "school", data = HolzingerSwineford1939))
summary(foo <- lm(x1 ~ -1 + school + school:ageyr, data = HolzingerSwineford1939))
rbind(pairs(emmeans(foo, specs = ~ school | ageyr,
at = list(ageyr = c(11, 13, 15)), nesting = NULL)))
Note that you have to turn off the auto-detection of nested factors in emmeans: vignette("messy-data", "emmeans")
.
On further consideration, I don't think conditional.x = TRUE
would cause an issue, but that is easily verified by simply setting it TRUE in your examples to verify results remain the same. In your first example, an error is returned by lavaan because apparently an interaction term is not recognized (it is looking for a variable named Sepal.Width:Petal.Length
). But in my multigroup example above, the user-defined mean differences do not change when I set conditional.x = TRUE
, so hopefully that is a good sign.
Alright, I'll make a PR that I can work on and start hacking at there issues... Will keep you posted.
TODO:
conditional.x = TRUE
?If you are going to pass a formula to lavaan.DV=
, why not get rid of the argument an just require a left-hand side of the formula passed to specs=
? Whereas a multivariate outcome in lm()
is automatically detected by the lefthand side being cbind(DV1, DV2, ...) ~ ...
in the model object, the operation specs = cbind(DV1, DV2, ...) ~ ...
could simply give you a way to select outcomes among the many endogenous variables in a lavaan model to be treated as repeated measures (via the existing rep.meas=
argument in emmeans.
Not sure it would mess anything up, but you could essentially invent your own operators in a formula object to indicate when multiple dummies indicate levels of a single factor. For example, {x, y, z} below are 3 levels of a factor, grouped by a dummy()
operator that needs not ever be passed to lm()
or any other model-fitting function.
terms( ~ w + dummy(x, y, z))
It would simply be a way to check for such a case and construct the fakeData
as needed. This post describes an easy method to create a factor from multiple dummies in a model matrix:
https://stat.ethz.ch/pipermail/r-help/2006-October/115706.html
Hope this helps. But no rush with this checklist. I appreciate your effort on this contribution, which I think will be very useful as an alternative to specifying Wald tests that would require a complex to specify with lavaan syntax, yet are automated in emmeans. It could even serve as an alternative to specifying a complex latent growth-curve or difference-score models, instead specifying a saturated mean structure among observed repeated measures (or a longitudinal CFA with scalar invariance constraints) and letting emmeans do all the work to specify polynomial or pairwise contrasts. Seems like there is a fruitful tutorial paper in there, once the functionality is in place.
specs =
already uses the LHS of the formula for contrast functions (.emmc
) names, so can't do that...
But passing a full formula to lavaan.DV
might also be the solution for multi-var DVs?
emmeans(semFit, ~ A,
lavaan.DV = cbind(Lat1, Lat2) ~ A + B + C)
(Just playing around with this idea for now... I'd have to see how hard it would be to implement it.)
I'll add that I am a fairly basic user of lavaan
-- or SEM, generally (but I would say I'm an advanced emmeans
user) -- so for the more advance stuff, you'll probably need to give me a hand.
But, as no one is in a rush... I'll keep you updated with any advancements 😀
Support for multi-var DVs:
library(lavaan)
library(semTools)
library(emmeans)
#### Multi-Variate DV ####
model <- '
ind60 =~ x1 + x2 + x3
# metric invariance
dem60 =~ y1 + a*y2 + b*y3 + c*y4
dem65 =~ y5 + a*y6 + b*y7 + c*y8
# scalar invariance
y1 + y5 ~ d*1
y2 + y6 ~ e*1
y3 + y7 ~ f*1
y4 + y8 ~ g*1
# regressions (slopes differ: interaction with time)
dem60 ~ b1*ind60
dem65 ~ b2*ind60 + NA*1 + Mean.Diff*1
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
# conditional mean differences (besides mean(ind60) == 0)
low := (-1*b2 + Mean.Diff) - (-1*b1) # 1 SD below M
high := (b2 + Mean.Diff) - b1 # 1 SD above M
'
semFit <- sem(model, data = PoliticalDemocracy)
## Compare contrasts
# From emmeans
emmeans(semFit, pairwise~ rep.meas|ind60, lavaan.DV = c("dem60","dem65"),
at = list(ind60 = c(-1,1)))[[2]]
#> Warning: Not all parameters are included in the VCOV matrix.
#> Perhaps some are fixed with a modifier, or the mean structure is missing.
#> Fixing VCOV for these parameters at 0.
#> ind60 = -1:
#> contrast estimate SE df z.ratio p.value
#> dem60 - dem65 0.8074 0.256 Inf 3.156 0.0016
#>
#> ind60 = 1:
#> contrast estimate SE df z.ratio p.value
#> dem60 - dem65 0.0174 0.251 Inf 0.070 0.9446
# From lavaan
parameterEstimates(semFit, output = "pretty")[49:50, ]
#>
#> Defined Parameters:
#> Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
#> low -0.807 0.256 -3.156 0.002 -1.309 -0.306
#> high -0.017 0.251 -0.070 0.945 -0.509 0.474
Created on 2020-02-22 by the reprex package (v0.3.0)
Support for multi group models
library(lavaan)
library(semTools)
library(emmeans)
model <- 'x1 ~ c(int1, int2)*1 + c(b1, b2)*ageyr
diff_11 := (int2 + b2*11) - (int1 + b1*11)
diff_13 := (int2 + b2*13) - (int1 + b1*13)
diff_15 := (int2 + b2*15) - (int1 + b1*15)
'
semFit <- sem(model, group = "school", data = HolzingerSwineford1939)
## Compare contrasts
# From lavaan
parameterEstimates(semFit, output = "pretty")
#>
#> ....
#>
#> Defined Parameters:
#> Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
#> diff_11 -0.160 0.295 -0.543 0.587 -0.738 0.418
#> diff_13 -0.047 0.139 -0.338 0.735 -0.318 0.225
#> diff_15 0.067 0.304 0.219 0.827 -0.530 0.663
# From emmeans (note `nesting = NULL`)
emmeans(semFit, pairwise ~ school | ageyr, lavaan.DV = "x1",
at = list(ageyr = c(11, 13, 15)), nesting = NULL)[[2]]
#> Warning: For multi-group models, don't forget to set 'nesting = NULL'.
#> See `?lavaan2emmeans` for more info.
#> ageyr = 11:
#> contrast estimate SE df z.ratio p.value
#> Grant-White - Pasteur -0.1603 0.295 Inf -0.543 0.5868
#>
#> ageyr = 13:
#> contrast estimate SE df z.ratio p.value
#> Grant-White - Pasteur -0.0468 0.139 Inf -0.338 0.7353
#>
#> ageyr = 15:
#> contrast estimate SE df z.ratio p.value
#> Grant-White - Pasteur 0.0666 0.304 Inf 0.219 0.8270
Created on 2020-02-23 by the reprex package (v0.3.0)
I think that resolves all the major issues; as for reconstructing factors, I haven't been able to find a way to do so automatically: it seems that users would need to supply a formula + original data (with original factors) / the terms from a user-build model-matrix, and even then it wouldn't been straight forward.
Seeing as how emmeans
detects [0,1] variables as two level binaries (and not continuous vars), it is relatively straight forward to compute contrasts as-is, with custom contrasts. For example:
library(lavaan)
library(semTools)
library(emmeans)
warpbreaks <- cbind(warpbreaks,
model.matrix(~ wool + tension, data = warpbreaks))
model <- "
# Split for convenience
breaks ~ 1
breaks ~ woolB
breaks ~ tensionM + tensionH
breaks ~ woolB:tensionM + woolB:tensionH
"
semFit <- sem(model, warpbreaks)
## Compare contrasts
# From lm -> emmeans
lmFit <- lm(breaks ~ wool * tension, data = warpbreaks)
lmEM <- emmeans(lmFit, ~ tension + wool)
contrast(lmEM, method = data.frame(L_all = c(-1, .05, 0.5),
M_H = c(0, 1, -1)), by = "wool")
#> wool = A:
#> contrast estimate SE df t.ratio p.value
#> L_all -31.078 4.08 48 -7.615 <.0001
#> M_H -0.556 5.16 48 -0.108 0.9147
#>
#> wool = B:
#> contrast estimate SE df t.ratio p.value
#> L_all -17.394 4.08 48 -4.262 0.0001
#> M_H 10.000 5.16 48 1.939 0.0584
# From lavaan -> emmeans
lavEM <- emmeans(semFit, ~ tensionM + tensionH + woolB,
lavaan.DV = "breaks")
contrast(lavEM,
method = list(
"L_all|A" = c(c(-1, .05, 0.5, 0), rep(0, 4)),
"M_H |A" = c(c(0, 1, -1, 0), rep(0, 4)),
"L_all|B" = c(rep(0, 4), c(-1, .05, 0.5, 0)),
"M_H |B" = c(rep(0, 4), c(0, 1, -1, 0))
))
#> contrast estimate SE df z.ratio p.value
#> L_all|A -31.078 3.85 Inf -8.077 <.0001
#> M_H |A -0.556 4.86 Inf -0.114 0.9090
#> L_all|B -17.394 3.85 Inf -4.521 <.0001
#> M_H |B 10.000 4.86 Inf 2.057 0.0397
Created on 2020-02-23 by the reprex package (v0.3.0)
Let me know if anything else is missing / any thoughts.
merged #70
@TDJorgensen Following our discussion here, I have made some progress with the
emmeans
integration oflavaan
. It seems that I've almost got it working completely!Examples
(the core functions are bellow)
Example 1
This example of moderation analsys, shows how these functions deal with:
nearly identical to:
Example 2
This example shows how these functions deal with:
same estimates - but different SE? @TDJorgensen Any idea why?
Functions