simsem / semTools

Useful tools for structural equation modeling
74 stars 35 forks source link

cfa.mi(): robust fit indices in summary with MLM estimator? #131

Closed soelderer closed 8 months ago

soelderer commented 9 months ago

Hi! This is not a bug exactly, rather a couple of questions.

I used mice to impute a dataset with ordinal items to conduct several CFAs on. Now I want to compare the fit of two CFAs regarding several fit indices.

This is my code, roughly:

imp <- mice(data, method = "polr", seed = 12345)
cfa1_fit <- cfa.mi(model = cfa1_model, data = imp, estimator = "MLM")
cfa3_fit <- cfa.mi(model = cfa3_model, data = imp, estimator = "MLM")

summary(cfa1_fit, fit.measures = TRUE, test = "D2", pool.robust = TRUE)
summary(cfa3_fit, fit.measures = TRUE, test = "D2", pool.robust = TRUE)

I have noticed that the summary of the single imputed datasets (obtained with complete(imp)) list robust versions of the fit indices, e.g.:

  Comparative Fit Index (CFI)                    0.942       0.893
  Tucker-Lewis Index (TLI)                       0.932       0.875

  Robust Comparative Fit Index (CFI)                         0.713
  Robust Tucker-Lewis Index (TLI)                            0.663

However, the summaries of the lavaan.mi objects look like this:

> summary(cfa3_fit, fit.measures = TRUE, test = "D2", pool.robust = TRUE)
lavaan.mi object based on 5 imputed data sets. 
See class?lavaan.mi help page for available methods. 

Convergence information:
The model converged on 5 imputed data sets 

Rubin's (1987) rules were used to pool point and SE estimates across 5 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.

Model Test User Model:

  Test statistic                               271.217     273.252
  Degrees of freedom                               116         116
  P-value                                        0.000       0.000

Model Test Baseline Model:

  Test statistic                              2892.418    1692.194
  Degrees of freedom                               136         136
  P-value                                        0.000       0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.944       0.899
  Tucker-Lewis Index (TLI)                       0.934       0.882

Root Mean Square Error of Approximation:

  RMSEA                                          0.084       0.084
  Confidence interval - lower                    0.071       0.071
  Confidence interval - upper                    0.097       0.097
  P-value H_0: RMSEA <= 0.05                     0.000       0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.111       0.111

Parameter Estimates:

  Standard errors                           Robust.sem
  Information                                 Expected
  Information saturated (h1) model        Unstructured

First, I don't quite understand the output, as the two columns are not labelled. Do they represent "standard" and "scaled"? Second, there are no robust variants of CFI, TLI and RMSEA. I guess these are not implemented yet? If not, is there a quick way to pool them "by hand"? Alas, I have no experience with multiple imputation so far.

Thanks a lot in advance! All the best, Paul

soelderer commented 9 months ago

I got it working without the test = "D2", pool.robust = TRUE" options. Sorry for the troubles, feel free to close the issue.

However, the pooled Robust Tucker-Lewis Index (TLI) is exactly 1.000, which seems odd to me, as the values for the individual imputed datasets are well below 1: 0.9423128 0.9374963 0.9320813 0.9512308 0.9365605

Someone reported something similar with CFI here: https://groups.google.com/g/lavaan/c/WrwVm4KBBzA

Is this expected behavior?

TDJorgensen commented 8 months ago

the two columns are not labelled. Do they represent "standard" and "scaled"?

Yes, just like in the summary() for a lavaan object. I hadn't noticed the lack of column labels, which turns out to happen because lavaan uses a separate internal print() method for the model's test, then prints the baseline-model's test and all indices using a print() method for fitMeasures() output (which does not include the column labels). For lavaan.mi objects, I just use the latter.

I am writing a new fitMeasures() method for lavaan.mi objects, which will resolve this problem.

TLI is exactly 1.000, which seems odd to me, as the values for the individual imputed datasets are well below 1

The TLI is calculated from the pooled chi-squared statistics. Try that calculation yourself.

Is this expected behavior?

No one really knows what to expect.

TDJorgensen commented 8 months ago

I am writing a new fitMeasures() method for lavaan.mi objects, which will resolve this problem.

But until then, this will add the column labels:

https://github.com/yrosseel/lavaan/pull/304