yrosseel / lavaan

an R package for structural equation modeling and more
http://lavaan.org
412 stars 99 forks source link

lavPredictY() yields no rows for newdata when irrelevant exogenous variables are NA #295

Closed TDJorgensen closed 8 months ago

TDJorgensen commented 9 months ago

Here's something that probably is not intended behavior.

mod <- '
  x4 + x5 ~ x1 + x2 + x3
# x6 also exogenous, but did not predict x4 or x5
  x7 ~ x1 + x2 + x3 + x4 + x5 + x6
'
## fit with(out) fixed.x
fit.fix <- sem(mod, data = HolzingerSwineford1939, fixed.x = TRUE)
fit.nox <- sem(mod, data = HolzingerSwineford1939, fixed.x = FALSE)

## arbitrary new data, using NA for variables that don't predict any "ynames"
ndatNA  <- expand.grid(x1 = 0:1, x2 = 0:1, x3 = 0:1, x4 = NA, x5 = NA, x6 = NA, x7 = NA)

## works fine without fixed.x
lavPredictY(fit.nox, xnames = c("x1","x2","x3"), ynames = c("x4","x5"), newdata = ndatNA)

## but fixed.x=TRUE leads to removing all rows due to is.na(x6))
lavPredictY(fit.fix, xnames = c("x1","x2","x3"), ynames = c("x4","x5"), newdata = ndatNA)

## setting just that to 0 instead of NA resolves the issue
ndatNA$x6 <- 0
lavPredictY(fit.fix, xnames = c("x1","x2","x3"), ynames = c("x4","x5"), newdata = ndatNA)

I think this is easily resolved on Line 60 by always setting missing = "ml.x" instead of merely "ml". I'll send a pull request.

yrosseel commented 8 months ago

Indeed. Merged now.