simsem / semTools

Useful tools for structural equation modeling
75 stars 36 forks source link

Reliability function with ordinal lavaan model #65

Closed hynekcigler closed 4 years ago

hynekcigler commented 4 years ago

Hi, thank you again for a great package! Anyway, I have problem with reliability function. If the lavaan model is categorical AND hierarchical, the reliability function provides an error with error message Error in r[cbind(1L:p, 1L:p)] <- 1 : subscript out of bounds. See an example:

library(psych)
library(lavaan)
library(semTools)

## Simulate data
dat <- sim.hierarchical(gload=NULL, fload=NULL, n = 500, raw = T, mu = NULL)
dat <- as.data.frame(dat$observed)
dat[dat<0] <- 0
dat[dat != 0] <- 1

## models
model1 <- " ## correlated
F1 =~ V1+V2+V3
F2 =~ V4+V5+V6
F3 =~ V7+V8+V9
"

model2 <- " ## hierarchical
F1 =~ V1+V2+V3
F2 =~ V4+V5+V6
F3 =~ V7+V8+V9
G =~ F1+F2+F3
"

model3 <- " ## S-1 bifactor
F1 =~ V1+V2+V3
F2 =~ V4+V5+V6
F3 =~ V7+V8+V9
G =~ V1+V2+V3+V4+V5+V6+V7+V8+V9
"

## correlated models
cfa1_int <- cfa(model1, dat)
reliability(cfa1_int)
cfa1_cat <- cfa(model1, dat, ordered=T)
reliability(cfa1_cat)

## hierarchical models
cfa2_int <- cfa(model2, dat)
reliability(cfa2_int)
reliabilityL2(cfa2_int, "G")
cfa2_cat <- cfa(model2, dat, ordered=T)
reliability(cfa2_cat) ## this does not work
reliabilityL2(cfa2_cat, "G") ## this works

## bifactor models
cfa3_int <- cfa(model3, dat, std.lv=T, orthogonal=T)
reliability(cfa3_int)
cfa3_cat <- cfa(model3, dat, ordered=T, std.lv=T, orthogonal=T) ## this works too
reliability(cfa3_cat)

I can still fit the correlated model, constrain correlation matrix given the hierarchical model and results have to be the same, but it is inconvenient. Am I missing something? Is the Green & Yang (2009) inappropriate for hierarchical model? If so, better warning message would be nice :)

And I have a question. Thus the reliabilityL2 function account for thresholds using Green & Yang (2009) approach too?

TDJorgensen commented 4 years ago

Thanks for the reproducible example.

Am I missing something? Is the Green & Yang (2009) inappropriate for hierarchical model?

No, the issue is just that reliability() currently (version 0.5-2.911) only checks once whether any indicators are categorical, so it still tries to use the Green & Yang (2009) routine even for the higher-order factor that has continuous (latent) indicators. Oddly, the higher-order factor reliability would be calculated as NaN in the continuous case anyway (because its loadings are in $beta, not $lambda), and is eventually dropped at the end of the function when it naively looks for factors with no loadings in $lambda. The current ?reliability help page also does not even mention how higher-order factors are handled. For factors having both observed and latent indicators, the current function would have returned the reliability for a composite formed only from the observed indicator(s) of the higher-order factor (because those are the only loadings it finds in $lambda). I think we can agree that is silly.

Does the reliabilityL2() function account for thresholds using Green & Yang (2009) approach too?

Yes.

I have updated the categorical flag to be dynamic, so that as reliability() loops over factors, it checks whether all indicators are continuous (observed or latent allowed) or categorical. As the reliability help page says, combinations are not yet allowed because Green & Yang only worked out polychorics, not polyserials. This extends to the possible scenario where a higher-order factor has a mixture of latent indicators and observed categorical indicators. I'm not sure reliabilityL2() currently considers mixtures of observed and latent indicators of higher-order factors, but the reliability help page now explicitly says higher-order factors are ignored (even if some indicators are observed), and a message is printed when they are detected in the model.

I also took the opportunity to make the "total" column optional, as well as conditional on there being more than one factor, so it doesn't return a redundant "total" column for a 1-factor model. If you want the total, you need to explicitly request it with the argument return.total = TRUE. Showing the "total" column by default confused users who only wanted reliability per scale.

devtools::install_github("simsem/semTools/semTools")
hynekcigler commented 4 years ago

Thank you very much, it works perfectly. And you are right, omega total sometimes mistified our students (and I also think reliability should be model-based), so it is a good idea to provide it only on specific request. Great work! :-)