tbates / umx

Making Structural Equation Modeling (SEM) in R quick & powerful
https://tbates.github.io/
44 stars 17 forks source link

umx_long2wide: support unusual twinIDs #128

Closed tbates closed 4 years ago

tbates commented 4 years ago

support families/zygosities where twin 1 is always missing

BUG: zygosities 2, 4, 5, 6, should be preserved in the wide data: 5 is being dropped... CAUSE: might be because zyg 5 consists of only twin 2 (OS) males...

Here's a minimal reproducible example:

load("~/Desktop/df1.RData", verbose=TRUE)

# ======================
# = Extract males only =
# ======================
tmp_m = subset(df1, zygosity<7 & sex==0 & twinid<3)
# PS: You don't need to filter for twinid here, can do it in umx_long2wide with twinIDs2keep= 1:2

table(tmp_m$zygosity,tmp_m$sex) # no women. values for zyg 2, 4, 5, & 6
#     0
# 2 417
# 4 308
# 5 145
# 6 152

tmp = umx_long2wide(data=tmp_m, famID = "familyid", twinID = "twinid", zygosity = "zygosity", twinIDs2keep= 1:2)
# Found 2 levels of twinID: '1' and '2'
# Dropped twinIDs: NULL
# Keeping twinIDs: '1' and '2', Dropping 0 rows out of 1022

table(tmp$zygosity)
#   2   4   5   6
# 243 184 145 152
# Used to be:
#   2   4   6 
# 208 156 151 

h/t @nathanGillespie

tbates commented 4 years ago
# Just make a long data set to play with 
data(twinData); tmp = twinData[, -2]; tmp$twinID1 = 1; tmp$twinID2 = 2

long = umx_wide2long(data = tmp, sep = "")

# make it wide!
wide = umx_long2wide(data= long, famID= "fam", twinID= "twinID", zygosity= "zygosity", 
    vars2keep = c("bmi", "wt"), passalong = "cohort"
)

namez(wide)
table(wide$zygosity)