Closed ukuvainik closed 2 years ago
I think the idea behind using latent variables is to put both traits on the same measurement error footing. When variables differ in their measurement error, results will be biased towards the more reliable measure causing the less reliable one. Yes, I think you could hack it with one variable (make the factor loading 1, the residual variance bits all zero, and free up the e's for the factor). That seems pretty easy with umx:
a2b = umxModify(DoC, fixlist, free = FALSE, value=0, name = "a2b", autoRun=F);
a2b = umxModify(a2b, c("FacLoad_r1c1","FacLoad_r2c2"), free = FALSE, value=1.0, name = "a2b", autoRun=F);
a2b = umxModify(a2b, c("e_r1c1","e_r2c2"), free = TRUE, value=1.0, name = "a2b")
HTH - Mike
Hi!
Thanks for the quick reply! umx is new for me so such guide is very helpful. What would be a way to insert reliability information? For instance .95 for trait1 and .80 for trait2.
Best, Uku
Well, in the same a2b model you could fix the factor loadings sqrt(1-.95^2) and sqrt(1-.8^2) and the e residual variances (es) at sqrt(.05) and sqrt(.8). However, these reliabilities may not be known with great precision, in which case adding mxModel groups that have the short-term test-retest reliability data in them, estimating the reliabilities and equating the factor loadings and residual variances to the relevant quantites (calculated in mxAlgebra probably easiest) would be better statistically.
Hi @ukuvainik thanks for your interest! If your measures are scores derived from items, and you have the items available, then the best approach would be to add the items (or item packets) as indicators of the latent causes.
That way the problem of reliability posturing as causality won't invalidate your results.
Hi all! Thank you all for your posts! They've been very helpful. I tried running the code suggested by @mcneale but got the error message "object 'fixlist' not found". I'm happy to get any advice on what I'm doing wrong (newbie in umx).
Thanks! Leo
# Example DoC with 3 indicators for latent variables
var1 = paste0("varA", 1:3)
var2 = paste0("varB", 1:3)
data(docData)
docData = umx_scale_wide_twin_data(c(var1, var2), docData, sep= "_T")
mzData = subset(docData, zygosity %in% c("MZFF", "MZMM"))
dzData = subset(docData, zygosity %in% c("DZFF", "DZMM"))
Chol = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= FALSE)
DoC = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= TRUE)
a2b = umxModify(DoC, "a2b", free = TRUE, name = "a2b"); summary(a2b)
b2a = umxModify(DoC, "b2a", free = TRUE, name = "b2a"); summary(b2a)
Recip = umxModify(DoC, c("a2b", "b2a"), free = TRUE, name = "Recip"); summary(Recip)
umxCompare(Chol, c(a2b, b2a, Recip))
# DoC with only 1 indicator each -> using suggested code by Mike
var1 = "varA1"
var2 = "varB1"
data(docData)
docData = umx_scale_wide_twin_data(c(var1, var2), docData, sep= "_T")
mzData = subset(docData, zygosity %in% c("MZFF", "MZMM"))
dzData = subset(docData, zygosity %in% c("DZFF", "DZMM"))
Chol = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= FALSE)
DoC = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= TRUE)
a2b = umxModify(DoC, fixlist, free = FALSE, value=0, name = "a2b", autoRun=F); a2b = umxModify(a2b, c("FacLoad_r1c1","FacLoad_r2c2"), free = FALSE, value=1.0, name = "a2b", autoRun=F); a2b = umxModify(a2b, c("e_r1c1","e_r2c2"), free = TRUE, value=1.0, name = "a2b")
# ERROR: object 'fixlist' not found
# b2a = umxModify(DoC, fixlist, free = FALSE, value=0, name = "b2a", autoRun=F); b2a = umxModify(b2a, c("FacLoad_r1c1","FacLoad_r2c2"), free = FALSE, value=1.0, name = "b2a", autoRun=F); b2a = umxModify(b2a, c("e_r1c1","e_r2c2"), free = TRUE, value=1.0, name = "b2a")
# Recip = umxModify(DoC, c("a2b", "b2a"), free = TRUE, name = "Recip"); summary(Recip)
# umxCompare(Chol, c(a2b, b2a, Recip))
Hi, my bad, I meant to include the code to figure out which parameters to fix. This I did by grabbing them from the summary and grabbing their names by position to avoid typing out all the "as_r1c1" "as_r2c2" "cs_r1c1" "cs_r2c2" "es_r1c1" "es_r2c2" stuff.
a2bsum <- summary(a2b);
fixlist <- a2bsum$parameters$name[7:12]
Thank you for the quick reply! I seem to get two issues: (1) If I use a2bsum$parameters$name[7:12]
I get other parameters than you named ("FacLoad_r2c2" "as_r1c1" "as_r2c2" "cs_r1c1" "cs_r2c2" "es_r1c1"). (2) Doesn't matter which parameters I include in the funtions (by position 7:12 vs. by name "as_r1ct" until "es_r2c2"), I always get Beta = 0.
DoC <- umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= TRUE)
a2b <- umxModify(DoC, "a2b", free = TRUE, name = "a2b")
a2bsum <- summary(a2b);
# fixlist <- a2bsum$parameters$name[7:12] # by position
fixlist <- c("as_r1c1", "as_r2c2", "cs_r1c1", "cs_r2c2", "es_r1c1", "es_r2c2") # by name
a2b <- umxModify(DoC, fixlist, free = FALSE, value=0, name = "a2b", autoRun=F); a2b = umxModify(a2b, c("FacLoad_r1c1","FacLoad_r2c2"), free = FALSE, value=1.0, name = "a2b", autoRun=F); a2b = umxModify(a2b, c("e_r1c1","e_r2c2"), free = TRUE, value=1.0, name = "a2b"); summary(a2b)
I thought that might happen, depending on which modified a2b you use. Here's the a2b model specified for one variable, and the b2a and reciprocal models are left as exercises:
library(umx)
#
# ========================
# = Does Rain cause Mud? =
# ========================
# =======================================
# = 2. Define manifests for var 1 and 2 =
# =======================================
var1 = paste0("varA", 1)
var2 = paste0("varB", 1)
# ================
# = 1. Load Data =
# ================
data(docData)
docData = umx_scale_wide_twin_data(c(var1, var2), docData, sep= "_T")
mzData = subset(docData, zygosity %in% c("MZFF", "MZMM"))
dzData = subset(docData, zygosity %in% c("DZFF", "DZMM"))
# =======================================================
# = 2. Make the non-causal (Cholesky) and causal models =
# =======================================================
Chol = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= FALSE)
DoC = umxDoC(var1= var1, var2= var2, mzData= mzData, dzData= dzData, causal= TRUE, autoRun=F)
print(parnames <- names(coef(DoC)))
fixlist <- parnames[7:12]
# ================================================
# = Make the directional models by modifying DoC =
# ================================================
a2b = umxModify(DoC, "a2b", free = TRUE, name = "a2b", autoRun=F)
# fix factor loadings to 1
a2b = umxModify(a2b, c("FacLoad_r1c1","FacLoad_r2c2"), free = FALSE, value=1.0, name = "a2b", autoRun=F)
# fix residuals as cs es to zero
a2b <- umxModify(a2b, fixlist, free = FALSE, value=0, name = "a2b", autoRun=F)
# free up the latent factor e's
a2b = umxModify(a2b, c("e_r1c1","e_r2c2"), free = TRUE, value=1.0, name = "a2b")
summary(a2b)
## Exercise: modify the b2a and Recip models in the same way
b2a = umxModify(DoC, "b2a", free = TRUE, name = "b2a", autoRun=F)
Recip = umxModify(DoC, c("a2b", "b2a"), free = TRUE, name = "Recip", autoRun=F)
In the end, it's not as pretty as if there was automatic detection that the model is univariate, and internal modification of the model accordingly. That is an exercise for Prof Bates :).
A quick way to capture that list but also avoid typing is:
namez(parameters(DoC)$name, patt= "[ace]s", collapse="vector")
"c('as_r1c1', 'as_r2c2', 'as_r3c3', 'as_r4c4', 'as_r5c5', 'as_r6c6', 'cs_r1c1', 'cs_r2c2', 'cs_r3c3', 'cs_r4c4', 'cs_r5c5', 'cs_r6c6', 'es_r1c1', 'es_r2c2', 'es_r3c3', 'es_r4c4', 'es_r5c5', 'es_r6c6')"
Thank you so much for the solution! I greatly appreciate it!
Hi again!
Leonard got the code working, thanks for all the input!
I have further questions with regard to modelling reliability. Is it reasonable to aggregate all the items in a weighted manner?
Our goal is to analyse causality between behaviour and body mass index (BMI). Between personality and BMI, there are a lot of small effects, which we are underpowered to analyse one-by-one in a typical dataset of few thousand twins.
https://pubmed.ncbi.nlm.nih.gov/34247202/ https://osf.io/preprints/nutrixiv/q8ehr/
In paper above, we approached the problem by building a personality score, akin to a polygenic score and analysed causality between BMI and personality score for BMI. The score is built from out-of-sample weights. Granted, the latent variable modelling of such score would likely be poor. However, we had the opportunity for calculating the one-week stability of a score in a different dataset (~.80 vs self-reported BMI .95). We then did BMI vs BMIpersonality score DoC modelling and the results stayed the same whether we did or did not model the reliability effects.
We're now about to replicate the approach and would like to hear, if you see potential problems here
I'm confused by this: if you have a choice about aggregating the items and weighting them, then you would be MUCH better using umxDoC
as designed, building a latent variable of each of behavior and BMI, and modelling causation in the same single step.
I agree with Tim - do not do the weighted sum approach, it loses information about the relative accuracy of each person’s score. Also, if the results are the same whether you correct for unreliability or not, it suggests that the traits are uncorrelated, or the correction isn’t working as it should.
Hi! Thanks for picking up the thread! The latent score approach is not feasible, as I am aggregating across 100 items and these items are mostly not correlating with each other. They are from independent parts of the NEO-PI questionnaire
On Mon, Jan 24, 2022 at 1:42 PM Michael Neale @.***> wrote:
I agree with Tim - do not do the weighted sum approach, it loses information about the relative accuracy of each person’s score. Also, if the results are the same whether you correct for unreliability or not, it suggests that the traits are uncorrelated, or the correction isn’t working as it should.
— Reply to this email directly, view it on GitHub https://github.com/tbates/umx/issues/169#issuecomment-1020010393, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSO253N63WHSGF5XKMOF5LUXU3MBANCNFSM5E5NDPTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
If they aren't described well by a latent trait, then the DoC model will be misrepresenting what is happening, at best.
Aggregating NEO items across domains seems like the problem to address, not the solution.
But this is not really a GitHub issue for umx, but more a modelling question, or a question for journal reviewers, so I'll close it now.
Hi Tim!
Thank you very much for coding up the DoC analyses in UMX! That will make our lives so easy!
When trying it out, we discovered that the umxDoC function assumes latent variables. Is there a way to use the function when trait1 and trait2 each have only one variable?
https://rdrr.io/github/tbates/umx/man/umxDoC.html
Thanks!
Uku