yrosseel / lavaan

an R package for structural equation modeling and more
http://lavaan.org
412 stars 99 forks source link

Multilevel model with ordinal data #225

Open mantas-t opened 2 years ago

mantas-t commented 2 years ago

Hi, I found that multilevel modeling is supported (https://lavaan.ugent.be/tutorial/multilevel.html) as is ordinal variables/regression (https://lavaan.ugent.be/tutorial/cat.html). However multilevel ordinal regression seems not to be supported.

Is there a way around to get it working? If not, any chance this could be supported in the near future?

With standard two level syntax and one (observable) ordinal variable I get the following error: Error in th.start.idx[i]:th.end.idx[i] : NA/NaN argument In addition: Warning messages: 1: In lav_options_set(opt) : lavaan WARNING: information will be set to “expected” for estimator = “DWLS” 2: In lav_samplestats_from_data(lavdata = lavdata, lavoptions = lavoptions, : lavaan ERROR: multilevel + categorical not supported yet.

TDJorgensen commented 2 years ago

MLSEM for categorical outcomes requires numerical integration, which is quite slow using R's built-in optimizers. Thus, it will be a while before this is possible in lavaan. But there are a couple other possibilities, potentially. Can you tell us more about your model and data?

mantas-t commented 2 years ago

Thanks for the comments and willingness to investigate on possibilities even if it is currently not implemented and is not planed to implement in lavaan in the nearest future. More context about my task is given below.

I have longitudinal/panel data with multiple observations over time of the same subjects. So each subject is a "cluster" in this case. There are also other clustering structure present as well e.g. some subjects belonging to the same structural group (or cluster in your terminology). The minimum I would like to do is to account somehow for the correlation within subjects (different observations of the same subject).

In my real task there are ~100k subjects with ~15-20 measurements per subject. There are ~20-30 indicators (x's) and ~5-7 factors I am considering. Most x's are already "embedded" in some way and are mapped to R (S->R), originally being ordinal or categorical. So I think x's could be treated as continuous in the worst case. Y is an endogenous (observable) variable of the primary interest and it is ordinal with 7 levels (6 underlying thresholds). Levels of y are imbalanced meaning that there are much more observations with y = l0 compared to other levels. In the worst case binning some levels together to form a binary variable might be an option, but keeping original 7 levels would make the analysis much richer.

An illustration on the simple 2-level model is given below. I guess it could be something to start with if y would be continuous, however it is ordinal.

model <-' level: 1 L1 =~ x1 + x2 + x3 L2 =~ x4 + x5 y ~ L1 + L2 level: 2 y ~ 0 y~~ y '

isaactpetersen commented 2 years ago

I would be very interested in obtaining cluster-robust standard errors when using ordinal data (and WLSMV).

yrosseel commented 2 years ago

I am not sure if I will ever implement random effect models with ordinal data in lavaan. It would just be too slow (in R) to be practical. But Nick Rockwood is working on glavaan, which will have a backend in C++. The idea of glavaan is to support many outcome types (binary, ordinal, ...) similar to generalized linear models. Eventually (but please do not ask when) he may also add support for multilevel models.

As for cluster-robust standard errors for ordinal data + WLSMV: can Mplus do this? Is there any documentation about this?

isaactpetersen commented 2 years ago

Yes, I believe Mplus supports cluster-robust standard errors for ordinal data + WLSMV. I'm not sure about documentation, but I created an example with manifest variables (though it appears to work with latent variables, as well), as adapted from here. See the attached files for the Mplus data file, and the input and output files for 1) a regular ordinal model and 2) an ordinal model with cluster-robust standard errors. Mplus files.zip

yrosseel commented 2 years ago

Thanks. That was helpful. I have not found papers/literature about this yet, but I do have some ideas how this can be done: when we compute the 'cluster robust' standard errors, we must recompute the 'WLS.W' matrix (see lav_muthen1984) taking the clustering into account, in order to compute a cluster-robust 'Gamma' matrix, which is then used in lav_model_nvcov_robust_sem(). At least, that is my current hypothesis.

sda030 commented 1 year ago

Is this of any help? https://www.statmodel.com/download/JSM2007000746.pdf

yrosseel commented 1 year ago

@sda030 Thanks. This document is about twolevel SEM with WLS. Unfortunately (as always with Mplus documents), they are a bit vague on the details. It might be feasible to implement this, but I think I will focus on the 'cluster-robust' version first.

sda030 commented 1 year ago

Yeah, I recognized it was on the model-based side rather than design-based, but hoped you could magically extract what you needed... These ones below are more on design-based inference, but even hazier on the equations and details I guess. (Sorry, I am not a statistician, just trying to help by collecting possibly relevant materials). https://www.statmodel.com/download/webnotes/mplusnote72.pdf http://www.statmodel.com/bmuthen/articles/Article_059.pdf

On Wed, 20 Jul 2022, 09:26 Yves Rosseel, @.***> wrote:

@sda030 https://github.com/sda030 Thanks. This document is about twolevel SEM with WLS. Unfortunately (as always with Mplus documents), they are a bit vague on the details. It might be feasible to implement this, but I think I will focus on the 'cluster-robust' version first.

— Reply to this email directly, view it on GitHub https://github.com/yrosseel/lavaan/issues/225#issuecomment-1189925302, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE3367JTS4JHVPWNEITTFTVU6S3JANCNFSM5KIW3CQQ . You are receiving this because you were mentioned.Message ID: @.***>