umxSummary for the brilliant umxTwinMaker

lf-araujo commented 1 year ago

Find a way to cleverly detect the cross-twin correlation, report those but no parameters for each twin:

MWE:


     data(twinData)
     tmp = umx_make_twin_data_nice(data=twinData, sep="", zygosity="zygosity", numbering=1:2)
     tmp = umx_scale_wide_twin_data(varsToScale= c("wt", "ht"), sep= "_T", data= tmp)
     mzData = subset(tmp, zygosity %in%  c("MZFF", "MZMM"))
     dzData = subset(tmp, zygosity %in%  c("DZFF", "DZMM"))

     # ==========================
     # = Make an ACE twin model =
     # ==========================
     # 1. Define paths for *one* person:
     paths = c(
        umxPath(v1m0 = c("a1", 'c1', "e1")),
        umxPath(means = c("wt")),
        umxPath(c("a1", 'c1', "e1"), to = "wt", values=.2)
     )
     # 2. Make a twin model from the paths for one person
     m1 = umxTwinMaker("test", paths, mzData = mzData, dzData= dzData)

umxSummary(m1)

Results in duplicates that are really annoying to chop off in complex models:

Table: Parameter loadings for model 'test'

	name	Estimate	SE	type
8	a1_T1_MZr_a1_T2	1.00	0	Factor Cov
10	c1_T1_MZr_c1_T2	1.00	0	Factor Cov
24	a1_T1_DZr_a1_T2	0.50	0	Factor Cov
26	c1_T1_DZr_c1_T2	1.00	0	Factor Cov
1	a1_to_wt	0.77	0.02	Factor loading
2	c1_to_wt	0.46	0.04	Factor loading
3	e1_to_wt	0.38	0.01	Factor loading
4	a1_to_wt	0.77	0.02	Factor loading
5	c1_to_wt	0.46	0.04	Factor loading
6	e1_to_wt	0.38	0.01	Factor loading
17	a1_to_wt	0.77	0.02	Factor loading
18	c1_to_wt	0.46	0.04	Factor loading
19	e1_to_wt	0.38	0.01	Factor loading
20	a1_to_wt	0.77	0.02	Factor loading
21	c1_to_wt	0.46	0.04	Factor loading
22	e1_to_wt	0.38	0.01	Factor loading
7	a1_T1_with_a1_T1	1.00	0	Factor Variance
9	c1_T1_with_c1_T1	1.00	0	Factor Variance
11	e1_T1_with_e1_T1	1.00	0	Factor Variance
12	a1_T2_with_a1_T2	1.00	0	Factor Variance
13	c1_T2_with_c1_T2	1.00	0	Factor Variance
14	e1_T2_with_e1_T2	1.00	0	Factor Variance
23	a1_T1_with_a1_T1	1.00	0	Factor Variance
25	c1_T1_with_c1_T1	1.00	0	Factor Variance
27	e1_T1_with_e1_T1	1.00	0	Factor Variance
28	a1_T2_with_a1_T2	1.00	0	Factor Variance
29	c1_T2_with_c1_T2	1.00	0	Factor Variance
30	e1_T2_with_e1_T2	1.00	0	Factor Variance
15	one_to_wt	-0.07	0.02	Mean
16	one_to_wt	-0.07	0.02	Mean
31	one_to_wt	-0.07	0.02	Mean
32	one_to_wt	-0.07	0.02	Mean

Model Fit: χ²(6) = 8.64, p = 0.195; CFI = 0.999; TLI = 1; RMSEA = 0.012

tbates commented 1 year ago

What's regex to filter parameter names? (we could require people to use some naming pattern to gain this functionality)

lf-araujo commented 1 year ago

I don't think there is an easy regex to detect which to retain in the example above. There is no regex for multiply comparing strings and checking for duplicates AFAIK.

mcneale commented 1 year ago

It can be done with awk: https://www.rockyourcode.com/how-i-remove-duplicate-lines-from-a-file-with-awk/ which is what I'd probably choose, but I did see a similar case with regex here: https://salesforce.stackexchange.com/questions/333509/regex-validation-rule-to-prevent-duplicate-in-set-of-numbers - I figure it may work with strings instead of numbers with idk how much tweaking

tbates commented 1 year ago

So this is easier than at first glance: You just want identical paths to appear once, i.e., filter the list to remove additional copies of paths with identical from & to & direction & label. IIRC there's already some code in umxSummary which is looking at that when accumulating the list in multi-group models, so probably could be done readily.

Question: Should this always be true, or should it be an option? i.e., is there any circumstance where a user would want a summary with duplicates to make it harder to parse - seems like not.

lf-araujo commented 1 year ago

As a relatively junior on the OpenMx specifications, I might miss the full range of applications. As far as I used it, it only made sense to have no duplications in the report.

tbates commented 1 year ago

OK: Thanks for the excellent short code showing the problem @lf-araujo !
Have a bang on the current GitHub version of umx which implements a fix, culling all the duplicate paths:

Table: Parameter loadings for model 'test'

	name	Estimate	SE	type
8	a1_T1_MZr_a1_T2	1.00	0	Factor Cov
10	c1_T1_MZr_c1_T2	1.00	0	Factor Cov
24	a1_T1_DZr_a1_T2	0.50	0	Factor Cov
26	c1_T1_DZr_c1_T2	1.00	0	Factor Cov
1	a1_to_wt	0.77	0.02	Factor loading
2	c1_to_wt	0.46	0.04	Factor loading
3	e1_to_wt	0.38	0.01	Factor loading
7	a1_T1_with_a1_T1	1.00	0	Factor Variance
9	c1_T1_with_c1_T1	1.00	0	Factor Variance
11	e1_T1_with_e1_T1	1.00	0	Factor Variance
12	a1_T2_with_a1_T2	1.00	0	Factor Variance
13	c1_T2_with_c1_T2	1.00	0	Factor Variance
14	e1_T2_with_e1_T2	1.00	0	Factor Variance
15	one_to_wt	-0.07	0.02	Mean

Model Fit: χ²(6) = 8.64, p = 0.195; CFI = 0.999; TLI = 1; RMSEA = 0.012

lf-araujo commented 1 year ago

Tested locally and it is working as expected. Thank you! The reports will look much better now!

tbates / umx

umxSummary for the brilliant umxTwinMaker #223

Table: Parameter loadings for model 'test'