Closed mattansb closed 1 year ago
Hey, I'm still figuring out if this is the best route to take here (see discussion here https://github.com/rvlenth/emmeans/issues/342). I have a holiday coming up, so I'll probably only get back to this later this month / early May.
Either way, I will add something more "actionable" in the warning (what ever it may be).
Thanks @TDJorgensen !
Okay, now I've caught up on the conversation with rvlenth. I agree with his last point that it is purely a practical issue for the automatic values emmeans
chooses for factors (all observed levels) and covariates (the mean). In that case, perhaps the simplest solution is to extract lavInspect(fit, "data")
, extract the columns that are relevant (all included in terms and the lavaan.DV=
), then apply a function to each column that replaces any NA
with the mean(..., na.rm=TRUE)
.
I forget whether the only factor
variable recognized in lavaan2emmeans
is the group=
variable in a multigroup model. Is there a way to specify that a set of dummy codes or other contrasts are really representing a single grouping variable? If not, then I don't see any complications that could arise, except that lavaan can also be fitted to summary statistics rather than raw data (in which case, at least missing data is not an issue, and perhaps simulating a single data set with MASS::mvrnorm()
would suffice).
Upon further inspection, I think your .emlav_recover_data()
function could be drastically simplified using new lavPredict()
arguments that automatically append original data to factor scores and combine multiple groups into a single data.frame
. See this example:
library(lavaan)
HS <- HolzingerSwineford1939
HS$x1[1:5] <- NA
foo <- sem('x1 ~ x2 + x3', data = HS, group = "school",
meanstructure = TRUE, missing = "FIML")
dat <- lavPredict(foo, assemble = TRUE, append.data = TRUE)
head(dat)
Note that it even works with (multigroup) path models without latent factors, because nothing needs to be appended to the fitted data.
I think the change below would solve the missing-data problem, too. Please let me know whenever you have a chance to get back to this.
.emlav_recover_data <- function(object){
dat <- lavaan::lavPredict(object, type = "lv",
assemble = TRUE,
append.data = TRUE)
## mean-impute any NAs
for (i in seq_along(dat)) {
## ignore non-numeric variables
if (!is.numeric(dat[,i])) next
## any NAs?
idx.na <- which(is.na(dat[,i]))
if (length(idx.na)) {
dat[idx.na, i] <- mean(dat[,i], na.rm = TRUE)
}
}
## convert to data.frame, if necessary (single group)
as.data.frame(dat)
}
Hi @mattansb, just checking in.
While working on a teaching example, I came across an odd error with multigroup models. Messing around narrowed it down to the case when sample sizes were equal. Adapting your help-page example to have 100 from each school reproduces the problem:
HS <- rbind(head(HolzingerSwineford1939, 100), tail(HolzingerSwineford1939, 100))
model <- 'x1 ~ c(int1, int2)*1 + c(b1, b2)*ageyr
diff_11 := (int2 + b2*11) - (int1 + b1*11)
diff_13 := (int2 + b2*13) - (int1 + b1*13)
diff_15 := (int2 + b2*15) - (int1 + b1*15)
'
semFit <- sem(model, group = "school", data = HS)
emmeans(semFit, pairwise ~ school | ageyr, lavaan.DV = "x1",
at = list(ageyr = c(11, 13, 15)), nesting = NULL)[[2]]
The problem is in .emlav_recover_data()
, when using unlist(group_labels)
. With equal sample sizes, the sapply()
call that creates group_labels
will return a matrix
rather than a list
, unless you add the argument simplify = FALSE
. That's a quick fix I can make to the main branch, but if you are going to try to close this issue soon anyway, perhaps you could just add it to your pull request. No pressure, just let me know.
Thanks!
Hi @TDJorgensen
Happy to hear you're using this emmeans functionality! I think it would be best for you to make that fix on main - I'm currently busy writing my dissertation, so this PR has been put on the back burner (along with about 1000 more things).
(I hope to get back to this and my other R-dev activities sometime next month.)
Okay, I copied your changes from the first commit on this PR, to make it robust to different ordering of names. Then I just used my suggested .emlav_recover_data()
change, which avoids the missing-data problems with reproducing the data, as well as the issue I posted about the other day.
d7d8b27
So I think this pull request can be closed, if you don't end up finding the same issues you had before.
This PR is meant to allow
emmeans
support to be robust to the term order in an interaction term inlavaan
.For example, now
emmeans
will give the same results for all 4 interaction regressions:y ~ a + b + a:b
y ~ a + b + b:a
y ~ b + a + a:b
y ~ b + a + b:a