jean997 / cause

R package for CAUSE
https://jean997.github.io/cause/
52 stars 15 forks source link

'gwas_merge' may select a wrong columns #10

Closed zhanghaoyang0 closed 3 years ago

zhanghaoyang0 commented 3 years ago

Hi Jean,

Thanks for your great MR tools.

I found that 'gwas_merge' may select wrong columns when my data have columns name the same as the parameter's names.

If I have both 'SE' and 'se' columns, and I told gwas_merge to merge 'SE' instead of 'se'. It still merge by 'se'. Here is an example: df1 = data.frame('SNP'='rs143225517', 'A1' = 'C', 'A2' = 'T', 'FRQ' = 0.14, 'b' = 0.1, 'se' = 0.2, 'BETA' = 0.3, 'SE' = 0.4) df2 = data.frame('SNP'='rs143225517', 'A1' = 'C', 'A2' = 'T', 'FRQ' = 0.14, 'b' = 0.1, 'se' = 0.2, 'BETA' = 0.3, 'SE' = 0.4) df <- gwas_merge(df1, df2, snp_name_cols = c("SNP", "SNP"), beta_hat_cols = c("BETA", "BETA"), se_cols = c("SE", "SE"), A1_cols = c("A1", "A1"), A2_cols = c("A2", "A2")) df1 df2 df

The results is like:

df1 SNP A1 A2 FRQ b se BETA SE 1 rs143225517 C T 0.14 0.1 0.2 0.3 0.4 df2 SNP A1 A2 FRQ b se BETA SE 1 rs143225517 C T 0.14 0.1 0.2 0.3 0.4 df snp beta_hat_1 seb1 beta_hat_2 seb2 A1 A2 1 rs143225517 -0.3 0.2 -0.3 0.2 A G

This is because in line 49 of 'gwas_merge', the 'select' function don't know 'se' is an external variable. The output format of mtCOJO generated both 'se' and 'SE' columns, and someone want to use this format as input may face this problem.

Best Regards, Haoyang Zhang

jean997 commented 3 years ago

Thanks, this is really clear. I think I've fixed it. The issue was actually in gwas_format. Try the latest version, hopefully that will solve the problem. Jean

zhanghaoyang0 commented 3 years ago

Thank you very much!