simsem / semTools

Useful tools for structural equation modeling
75 stars 36 forks source link

Fit measures have wrong values when using multiple imputations #29

Closed aneumann-science closed 6 years ago

aneumann-science commented 6 years ago

Dear all,

I frequently use multiple imputation with semTools and came across an issue, that I frequently encounter highly implausible pooled fit measures, such as 0 values for CFI or negative values for NFI. The fit indices for each individual imputed dataset are much higher, as well as their average, leading me to believe this is a bug. It is difficult to reproduce, because it appears to work correctly for most data sets and models, and it is not clear to me which situations cause it.

Fortunately, I could now reproduce the error with a public dataset, specifically the rosenberg self-esteem scale from http://openpsychometrics.org/_rawdata/RSE.zip. In this particular case the bug does not appear with MLR, but does appear with WLSMV. I use the latest version semTools 0.4-15.913, but this problem was also present in earlier releases. Below the code to reproduce the bug:

# Example code to demonstrate problems with pooling fit indices in semTools 0.4-15.913, lavaan 0.6-1.1189
# Code based on runMI example
# Data obtained from http://openpsychometrics.org/
library(semTools)
library(mice)

# Load Rosenberg self-esteem scale data from http://openpsychometrics.org/_rawdata/RSE.zip
# Only first 10000 participants for faster computation
rosenberg.data <- read.table("data.csv", header = T)[1:10000,1:10]

## impose 50% missing values
set.seed(123)
for (i in 1:10) rosenberg.data[sample(1:nrow(rosenberg.data), size = .5*nrow(rosenberg.data)), i] <- NA

# Impute with mice
imp <- mice(rosenberg.data)

# Create list of imputations
nImputations <- 5
impList <- list()
for (i in 1:nImputations) {
  impList[[i]] <- complete(imp, action = i)
}

# CFA model with all items loading on one factor
rosenberg.model <- '
A =~ Q1 + Q2 + Q3 + Q4 + Q5 + Q6 + Q7 + Q8 + Q9 + Q10
'
# Fit CFA across all imputed datasets and pool results
# All items are treated as ordinal
rosenberg.fit <- cfa.mi(rosenberg.model, data = impList, estimator = "WLSMV", ordered = c("Q1", "Q2", "Q3", "Q4", "Q5", "Q6", "Q7", "Q8", "Q9", "Q10"))

# Obtain pooled fit indices
# Note the very low cfi.scaled of 0.052 and negative values for nnfi.scaled and tli.scales
anova(rosenberg.fit, indices = T)

### Compare pooled fit indices to average fit indices
# Loop over all imputed datasets, fit the model and extract fit indices
indices <- lapply(impList, function(data) {
  fit <- cfa(rosenberg.model, data = data, estimator = "WLSMV", ordered = c("Q1", "Q2", "Q3", "Q4", "Q5", "Q6", "Q7", "Q8", "Q9", "Q10"))
  fitMeasures(fit)
})

# Average fit indices over all imputed data sets
# Fit indices across all measures are much higher, e.g. cfi.scaled is 0.948
indices <- do.call(rbind, indices)
round(apply(indices,2,mean),3)
TDJorgensen commented 6 years ago

This has come up before with pooling scaled/shifted statistics from DWLS (WLS"MV"). You can read this thread for my initial thoughts:

https://groups.google.com/d/msg/lavaan/WM2Ynmatsmk/q_5S9PXgAwAJ

The pooling behavior is not entirely predictable, but it might be a bug (hopefully!). So thank you for providing a data set and reproducible example so I can actually investigate it this time :-) I'll look into it soon, and get back to you here.

TDJorgensen commented 6 years ago

Still no evidence of a bug, just the odd behavior of pooling test statistics.

I decided to post my very long reply on the lavaan forum, where others can also benefit from this discussion. So please read my response there, and if you wish to continue the discussion, we can do so there (since this does not appear to be a software issue).

https://groups.google.com/d/msg/lavaan/WM2Ynmatsmk/kAKH8yZ8AwAJ

Thanks again for the example!