Closed WillemSleegers closed 1 year ago
Good catch - dynamic
does indeed rely on ML. This is because dynamic
simulates multivariate normal data with non-missingness to estimate the cutoffs, so there's no need to use a MLR estimator (the results would be identical to ML). This is certainly a limitation of the method as it exists currently, but we felt that asking users to upload the skew and kurtosis for each of their items would be too burdensome.
We're currently working on a bootstrapping extension in which users can upload their raw data instead of their standardized loadings, which will better capture any non-normality, and we plan to allow users to choose MLR in that case. We're also working on extensions for categorical data (e.g., WLSMV estimator). Stay tuned!
Sorry for re-opening the issue but I'd like to ask a clarification question. I thought one of the use-cases for using the MLR estimator is to deal with non-normality, not missingness. I also expect different results because I get the robust variants of the fit indices (e.g., RMSEA.robust, CFI.robust), which I don't think are used in your simulation and so would produce different distributions to obtain the cutoffs from. So would it not be possible and/or better to have an argument to specify robust = TRUE
or something like that and then use the robust variants of the indices rather than the normal ones?
Correct, it's for non-normality. To calculate the cutoffs, we simulate the data using a package called simstandard
which simulates multivariate normal data from user-inputted standardized loadings using mvtnorm
. We then fit several models to it using lavaan
: one of which is correctly specified and several which are incorrectly specified, and then return DFI cutoff values that can distinguish between the two distributions of cutoffs (e.g., correctly specified vs misspecified).
It sounds like what you're asking for is for us to return the RMSEA.robust
and CFI.robust
from the fitMeasures
function, which are returned automatically when you use estimator=MLR
. Let's use an empirical example to see what happens when we simulate multivariate normal data and then estimate a solution using MLR
.
We'll use the standardized loadings from the HolzingerSwineford1939
dataset, and our model will be:
cfamod <- "F1 =~ x1 + x2 + x3
F2 =~ x4 + x5 + x6
F3 =~ x7 + x8 + x9"
We can run a cfa on this using cfa
from lavaan
and estimator=MLR
. We then use standardizedSolution
to get the standardized loadings, which gives us the following model statement:
standardizedmod <- "F1 =~ .772*x1 + .424*x2 + .581*x3
F2 =~ .852*x4 + .855*x5 + .838*x6
F3 =~ .570*x7 + .723*x8 + .665*x9"
We can then simulate the multivariate normal data using the simstandard
package with 500 replications, as we do in dynamic
. We begin by setting a seed for replicability.
set.seed(888)
dat <- sim_standardized(standardizedmod,n=301*500, estimator=MLR)
We put the data in list form by adding an id for each of the 500 datasets:
id <- rep(1:301,times=500)
dat_id <- cbind(id,dat)
dat_list <- dat_id %>% dplyr::group_by(id) %>% tidyr::nest() %>% as.list()
We then run 500 cfa models with estimator=MLR
so that we get back both types of fit indices:
fit_list <- purrr::map(dat_list$data, ~cfa(model=misspecifiedmod,data=., estimator="MLR"))
We extract the CFI
, CFI.robust
, RMSEA
, and RMSEA.robust
into a dataset.
fit_dat <- purrr::map_dfr(fit_list,~fitMeasures(., c("cfi","cfi.robust","rmsea","rmsea.robust")))
Running colMeans on fit_dat
shows us that the robust versions of each fit index are virtually identical to their counterpart:
colMeans(fit_dat)
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
cfi | cfi.robust | rmsea | rmsea.robust -- | -- | -- | -- 0.997602 | 0.997503 | 0.010278 | 0.010547
Thanks for this great package (and fantastic associated papers)!
I was wondering whether it is possible to add an option to specify the estimator. In my code I'm using the MLR estimator but I don't think
dynamic
is using that estimator under the hood and instead only relies on ML. I think it might be beneficial to add different estimators, particularly the more robust ones.