tbates / umx

Making Structural Equation Modeling (SEM) in R quick & powerful
https://tbates.github.io/
44 stars 17 forks source link

manifest variables in umxDoC #169

Closed ukuvainik closed 2 years ago

ukuvainik commented 3 years ago

Hi Tim!

Thank you very much for coding up the DoC analyses in UMX! That will make our lives so easy!

When trying it out, we discovered that the umxDoC function assumes latent variables. Is there a way to use the function when trait1 and trait2 each have only one variable?

https://rdrr.io/github/tbates/umx/man/umxDoC.html

Thanks!

Uku

mcneale commented 3 years ago

I think the idea behind using latent variables is to put both traits on the same measurement error footing. When variables differ in their measurement error, results will be biased towards the more reliable measure causing the less reliable one. Yes, I think you could hack it with one variable (make the factor loading 1, the residual variance bits all zero, and free up the e's for the factor). That seems pretty easy with umx:

a2b   = umxModify(DoC, fixlist, free = FALSE, value=0, name = "a2b", autoRun=F);
a2b   = umxModify(a2b, c("FacLoad_r1c1","FacLoad_r2c2"), free = FALSE, value=1.0, name = "a2b", autoRun=F);
a2b   = umxModify(a2b, c("e_r1c1","e_r2c2"), free = TRUE, value=1.0, name = "a2b")

HTH - Mike

ukuvainik commented 3 years ago

Hi!

Thanks for the quick reply! umx is new for me so such guide is very helpful. What would be a way to insert reliability information? For instance .95 for trait1 and .80 for trait2.

Best, Uku

mcneale commented 3 years ago

Well, in the same a2b model you could fix the factor loadings sqrt(1-.95^2) and sqrt(1-.8^2) and the e residual variances (es) at sqrt(.05) and sqrt(.8). However, these reliabilities may not be known with great precision, in which case adding mxModel groups that have the short-term test-retest reliability data in them, estimating the reliabilities and equating the factor loadings and residual variances to the relevant quantites (calculated in mxAlgebra probably easiest) would be better statistically.

tbates commented 3 years ago

Hi @ukuvainik thanks for your interest! If your measures are scores derived from items, and you have the items available, then the best approach would be to add the items (or item packets) as indicators of the latent causes.

That way the problem of reliability posturing as causality won't invalidate your results.

leonkulisch commented 3 years ago

Hi all! Thank you all for your posts! They've been very helpful. I tried running the code suggested by @mcneale but got the error message "object 'fixlist' not found". I'm happy to get any advice on what I'm doing wrong (newbie in umx).

Thanks! Leo

# Example DoC with 3 indicators for latent variables
var1 = paste0("varA", 1:3)
var2 = paste0("varB", 1:3)
data(docData)
docData = umx_scale_wide_twin_data(c(var1, var2), docData, sep= "_T")
mzData  = subset(docData, zygosity %in% c("MZFF", "MZMM"))
dzData  = subset(docData, zygosity %in% c("DZFF", "DZMM"))
Chol = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= FALSE)
DoC  = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= TRUE)
a2b   = umxModify(DoC, "a2b", free = TRUE, name = "a2b"); summary(a2b)
b2a   = umxModify(DoC, "b2a", free = TRUE, name = "b2a"); summary(b2a)
Recip = umxModify(DoC, c("a2b", "b2a"), free = TRUE, name = "Recip"); summary(Recip) 
umxCompare(Chol, c(a2b, b2a, Recip)) 
# DoC with only 1 indicator each -> using suggested code by Mike
var1 = "varA1"
var2 = "varB1"
data(docData)
docData = umx_scale_wide_twin_data(c(var1, var2), docData, sep= "_T")
mzData  = subset(docData, zygosity %in% c("MZFF", "MZMM"))
dzData  = subset(docData, zygosity %in% c("DZFF", "DZMM"))
Chol = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= FALSE)
DoC  = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= TRUE)
a2b = umxModify(DoC, fixlist, free = FALSE, value=0, name = "a2b", autoRun=F); a2b = umxModify(a2b, c("FacLoad_r1c1","FacLoad_r2c2"), free = FALSE, value=1.0, name = "a2b", autoRun=F); a2b = umxModify(a2b, c("e_r1c1","e_r2c2"), free = TRUE, value=1.0, name = "a2b")
# ERROR: object 'fixlist' not found
# b2a = umxModify(DoC, fixlist, free = FALSE, value=0, name = "b2a", autoRun=F); b2a = umxModify(b2a, c("FacLoad_r1c1","FacLoad_r2c2"), free = FALSE, value=1.0, name = "b2a", autoRun=F); b2a = umxModify(b2a, c("e_r1c1","e_r2c2"), free = TRUE, value=1.0, name = "b2a")
# Recip = umxModify(DoC, c("a2b", "b2a"), free = TRUE, name = "Recip"); summary(Recip)
# umxCompare(Chol, c(a2b, b2a, Recip))
mcneale commented 3 years ago

Hi, my bad, I meant to include the code to figure out which parameters to fix. This I did by grabbing them from the summary and grabbing their names by position to avoid typing out all the "as_r1c1" "as_r2c2" "cs_r1c1" "cs_r2c2" "es_r1c1" "es_r2c2" stuff.

a2bsum <- summary(a2b); 
fixlist <- a2bsum$parameters$name[7:12]
leonkulisch commented 3 years ago

Thank you for the quick reply! I seem to get two issues: (1) If I use a2bsum$parameters$name[7:12] I get other parameters than you named ("FacLoad_r2c2" "as_r1c1" "as_r2c2" "cs_r1c1" "cs_r2c2" "es_r1c1"). (2) Doesn't matter which parameters I include in the funtions (by position 7:12 vs. by name "as_r1ct" until "es_r2c2"), I always get Beta = 0.

DoC <- umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= TRUE)
a2b <- umxModify(DoC, "a2b", free = TRUE, name = "a2b")
a2bsum <- summary(a2b); 
# fixlist <- a2bsum$parameters$name[7:12] # by position
fixlist <- c("as_r1c1", "as_r2c2", "cs_r1c1", "cs_r2c2", "es_r1c1", "es_r2c2") # by name
a2b <- umxModify(DoC, fixlist, free = FALSE, value=0, name = "a2b", autoRun=F); a2b = umxModify(a2b, c("FacLoad_r1c1","FacLoad_r2c2"), free = FALSE, value=1.0, name = "a2b", autoRun=F); a2b = umxModify(a2b, c("e_r1c1","e_r2c2"), free = TRUE, value=1.0, name = "a2b"); summary(a2b)
mcneale commented 3 years ago

I thought that might happen, depending on which modified a2b you use. Here's the a2b model specified for one variable, and the b2a and reciprocal models are left as exercises:

library(umx)
#
# ========================
# = Does Rain cause Mud? =
# ========================

# =======================================
# = 2. Define manifests for var 1 and 2 =
# =======================================
var1 = paste0("varA", 1)
var2 = paste0("varB", 1)

# ================
# = 1. Load Data =
# ================
data(docData)
docData = umx_scale_wide_twin_data(c(var1, var2), docData, sep= "_T")
mzData  = subset(docData, zygosity %in% c("MZFF", "MZMM"))
dzData  = subset(docData, zygosity %in% c("DZFF", "DZMM"))

# =======================================================
# = 2. Make the non-causal (Cholesky) and causal models =
# =======================================================
Chol = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= FALSE)
DoC  = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= TRUE, autoRun=F)
print(parnames <- names(coef(DoC)))
fixlist <- parnames[7:12]
# ================================================
# = Make the directional models by modifying DoC =
# ================================================
a2b   = umxModify(DoC, "a2b", free = TRUE, name = "a2b", autoRun=F)

# fix factor loadings to 1
a2b = umxModify(a2b, c("FacLoad_r1c1","FacLoad_r2c2"), free = FALSE, value=1.0, name = "a2b", autoRun=F)
# fix residuals as cs es to zero
a2b <- umxModify(a2b, fixlist, free = FALSE, value=0, name = "a2b", autoRun=F)
# free up the latent factor e's
a2b = umxModify(a2b, c("e_r1c1","e_r2c2"), free = TRUE, value=1.0, name = "a2b")
summary(a2b)

## Exercise: modify the b2a and Recip models in the same way
b2a   = umxModify(DoC, "b2a", free = TRUE, name = "b2a", autoRun=F)
Recip = umxModify(DoC, c("a2b", "b2a"), free = TRUE, name = "Recip", autoRun=F)

In the end, it's not as pretty as if there was automatic detection that the model is univariate, and internal modification of the model accordingly. That is an exercise for Prof Bates :).

tbates commented 3 years ago

A quick way to capture that list but also avoid typing is:

namez(parameters(DoC)$name, patt= "[ace]s", collapse="vector")

"c('as_r1c1', 'as_r2c2', 'as_r3c3', 'as_r4c4', 'as_r5c5', 'as_r6c6', 'cs_r1c1', 'cs_r2c2', 'cs_r3c3', 'cs_r4c4', 'cs_r5c5', 'cs_r6c6', 'es_r1c1', 'es_r2c2', 'es_r3c3', 'es_r4c4', 'es_r5c5', 'es_r6c6')"

leonkulisch commented 3 years ago

Thank you so much for the solution! I greatly appreciate it!

ukuvainik commented 3 years ago

Hi again!

Leonard got the code working, thanks for all the input!

I have further questions with regard to modelling reliability. Is it reasonable to aggregate all the items in a weighted manner?

Our goal is to analyse causality between behaviour and body mass index (BMI). Between personality and BMI, there are a lot of small effects, which we are underpowered to analyse one-by-one in a typical dataset of few thousand twins.

https://pubmed.ncbi.nlm.nih.gov/34247202/ https://osf.io/preprints/nutrixiv/q8ehr/

In paper above, we approached the problem by building a personality score, akin to a polygenic score and analysed causality between BMI and personality score for BMI. The score is built from out-of-sample weights. Granted, the latent variable modelling of such score would likely be poor. However, we had the opportunity for calculating the one-week stability of a score in a different dataset (~.80 vs self-reported BMI .95). We then did BMI vs BMIpersonality score DoC modelling and the results stayed the same whether we did or did not model the reliability effects.

We're now about to replicate the approach and would like to hear, if you see potential problems here

tbates commented 2 years ago

I'm confused by this: if you have a choice about aggregating the items and weighting them, then you would be MUCH better using umxDoC as designed, building a latent variable of each of behavior and BMI, and modelling causation in the same single step.

mcneale commented 2 years ago

I agree with Tim - do not do the weighted sum approach, it loses information about the relative accuracy of each person’s score. Also, if the results are the same whether you correct for unreliability or not, it suggests that the traits are uncorrelated, or the correction isn’t working as it should.

ukuvainik commented 2 years ago

Hi! Thanks for picking up the thread! The latent score approach is not feasible, as I am aggregating across 100 items and these items are mostly not correlating with each other. They are from independent parts of the NEO-PI questionnaire

On Mon, Jan 24, 2022 at 1:42 PM Michael Neale @.***> wrote:

I agree with Tim - do not do the weighted sum approach, it loses information about the relative accuracy of each person’s score. Also, if the results are the same whether you correct for unreliability or not, it suggests that the traits are uncorrelated, or the correction isn’t working as it should.

— Reply to this email directly, view it on GitHub https://github.com/tbates/umx/issues/169#issuecomment-1020010393, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSO253N63WHSGF5XKMOF5LUXU3MBANCNFSM5E5NDPTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

tbates commented 2 years ago

If they aren't described well by a latent trait, then the DoC model will be misrepresenting what is happening, at best.

Aggregating NEO items across domains seems like the problem to address, not the solution.

But this is not really a GitHub issue for umx, but more a modelling question, or a question for journal reviewers, so I'll close it now.