tbates / umx

Making Structural Equation Modeling (SEM) in R quick & powerful
https://tbates.github.io/
44 stars 17 forks source link

umxSummary for the brilliant umxTwinMaker #223

Closed lf-araujo closed 1 year ago

lf-araujo commented 1 year ago

Find a way to cleverly detect the cross-twin correlation, report those but no parameters for each twin:

MWE:


     data(twinData)
     tmp = umx_make_twin_data_nice(data=twinData, sep="", zygosity="zygosity", numbering=1:2)
     tmp = umx_scale_wide_twin_data(varsToScale= c("wt", "ht"), sep= "_T", data= tmp)
     mzData = subset(tmp, zygosity %in%  c("MZFF", "MZMM"))
     dzData = subset(tmp, zygosity %in%  c("DZFF", "DZMM"))

     # ==========================
     # = Make an ACE twin model =
     # ==========================
     # 1. Define paths for *one* person:
     paths = c(
        umxPath(v1m0 = c("a1", 'c1', "e1")),
        umxPath(means = c("wt")),
        umxPath(c("a1", 'c1', "e1"), to = "wt", values=.2)
     )
     # 2. Make a twin model from the paths for one person
     m1 = umxTwinMaker("test", paths, mzData = mzData, dzData= dzData)

umxSummary(m1)

Results in duplicates that are really annoying to chop off in complex models:

Table: Parameter loadings for model 'test'

name Estimate SE type
8 a1_T1_MZr_a1_T2 1.00 0 Factor Cov
10 c1_T1_MZr_c1_T2 1.00 0 Factor Cov
24 a1_T1_DZr_a1_T2 0.50 0 Factor Cov
26 c1_T1_DZr_c1_T2 1.00 0 Factor Cov
1 a1_to_wt 0.77 0.02 Factor loading
2 c1_to_wt 0.46 0.04 Factor loading
3 e1_to_wt 0.38 0.01 Factor loading
4 a1_to_wt 0.77 0.02 Factor loading
5 c1_to_wt 0.46 0.04 Factor loading
6 e1_to_wt 0.38 0.01 Factor loading
17 a1_to_wt 0.77 0.02 Factor loading
18 c1_to_wt 0.46 0.04 Factor loading
19 e1_to_wt 0.38 0.01 Factor loading
20 a1_to_wt 0.77 0.02 Factor loading
21 c1_to_wt 0.46 0.04 Factor loading
22 e1_to_wt 0.38 0.01 Factor loading
7 a1_T1_with_a1_T1 1.00 0 Factor Variance
9 c1_T1_with_c1_T1 1.00 0 Factor Variance
11 e1_T1_with_e1_T1 1.00 0 Factor Variance
12 a1_T2_with_a1_T2 1.00 0 Factor Variance
13 c1_T2_with_c1_T2 1.00 0 Factor Variance
14 e1_T2_with_e1_T2 1.00 0 Factor Variance
23 a1_T1_with_a1_T1 1.00 0 Factor Variance
25 c1_T1_with_c1_T1 1.00 0 Factor Variance
27 e1_T1_with_e1_T1 1.00 0 Factor Variance
28 a1_T2_with_a1_T2 1.00 0 Factor Variance
29 c1_T2_with_c1_T2 1.00 0 Factor Variance
30 e1_T2_with_e1_T2 1.00 0 Factor Variance
15 one_to_wt -0.07 0.02 Mean
16 one_to_wt -0.07 0.02 Mean
31 one_to_wt -0.07 0.02 Mean
32 one_to_wt -0.07 0.02 Mean

Model Fit: χ²(6) = 8.64, p = 0.195; CFI = 0.999; TLI = 1; RMSEA = 0.012

tbates commented 1 year ago

What's regex to filter parameter names? (we could require people to use some naming pattern to gain this functionality)

lf-araujo commented 1 year ago

I don't think there is an easy regex to detect which to retain in the example above. There is no regex for multiply comparing strings and checking for duplicates AFAIK.

mcneale commented 1 year ago

It can be done with awk: https://www.rockyourcode.com/how-i-remove-duplicate-lines-from-a-file-with-awk/ which is what I'd probably choose, but I did see a similar case with regex here: https://salesforce.stackexchange.com/questions/333509/regex-validation-rule-to-prevent-duplicate-in-set-of-numbers - I figure it may work with strings instead of numbers with idk how much tweaking

tbates commented 1 year ago

So this is easier than at first glance: You just want identical paths to appear once, i.e., filter the list to remove additional copies of paths with identical from & to & direction & label. IIRC there's already some code in umxSummary which is looking at that when accumulating the list in multi-group models, so probably could be done readily.

Question: Should this always be true, or should it be an option? i.e., is there any circumstance where a user would want a summary with duplicates to make it harder to parse - seems like not.

lf-araujo commented 1 year ago

As a relatively junior on the OpenMx specifications, I might miss the full range of applications. As far as I used it, it only made sense to have no duplications in the report.

tbates commented 1 year ago

OK: Thanks for the excellent short code showing the problem @lf-araujo !
Have a bang on the current GitHub version of umx which implements a fix, culling all the duplicate paths:

Table: Parameter loadings for model 'test'

name Estimate SE type
8 a1_T1_MZr_a1_T2 1.00 0 Factor Cov
10 c1_T1_MZr_c1_T2 1.00 0 Factor Cov
24 a1_T1_DZr_a1_T2 0.50 0 Factor Cov
26 c1_T1_DZr_c1_T2 1.00 0 Factor Cov
1 a1_to_wt 0.77 0.02 Factor loading
2 c1_to_wt 0.46 0.04 Factor loading
3 e1_to_wt 0.38 0.01 Factor loading
7 a1_T1_with_a1_T1 1.00 0 Factor Variance
9 c1_T1_with_c1_T1 1.00 0 Factor Variance
11 e1_T1_with_e1_T1 1.00 0 Factor Variance
12 a1_T2_with_a1_T2 1.00 0 Factor Variance
13 c1_T2_with_c1_T2 1.00 0 Factor Variance
14 e1_T2_with_e1_T2 1.00 0 Factor Variance
15 one_to_wt -0.07 0.02 Mean

Model Fit: χ²(6) = 8.64, p = 0.195; CFI = 0.999; TLI = 1; RMSEA = 0.012

lf-araujo commented 1 year ago

Tested locally and it is working as expected. Thank you! The reports will look much better now!