zhanxw / rvtests

Rare variant test software for next generation sequencing data
129 stars 41 forks source link

BGEN file does not work #55

Open jielab opened 6 years ago

jielab commented 6 years ago

Hi, i converted the UKB chr22 BGEN file to VCF format using PLINK2. I first used PLINK to run association analysis on these two input files to make sure that i got the exact same results.

Then I used RVTESTS to run association analyses, using the BGEN file and the VCF file separately. I used "--dosage DS --impute drop --single score". However, please see the plots before, I found that the EAF, BETA, P between two analyses are totally different. I think I reported this before, now i am using the latest version of RVTESTS.

So, can you please take a look?

best regards, Jie

height rvtests-pgen-rvtests-vcf

dajiangliu commented 6 years ago

We will surely take a look. Thanks for the comparisons.

All the best, Dajiang

Assistant Professor Dept. of Public Health Sciences Institute of Personalized Medicine Penn State College of Medicine, HCAR 2020, Mail Stop R125 Email: dajiang.liu@psu.edu URL: https://dajiangliu.wordpress.com Tel: +1-717-531-4178


From: jiehuang001 notifications@github.com Sent: Thursday, February 15, 2018 10:14 AM To: zhanxw/rvtests Cc: Subscribed Subject: [zhanxw/rvtests] BGEN file does not work (#55)

Hi, i converted the UKB chr22 BGEN file to VCF format using PLINK2. I first used PLINK to run association analysis on these two input files to make sure that i got the exact same results.

Then I used RVTESTS to run association analyses, using the BGEN file and the VCF file separately. I used "--dosage DS --impute drop --single score". However, please see the plots before, I found that the EAF, BETA, P between two analyses are totally different. I think I reported this before, now i am using the latest version of RVTESTS.

So, can you please take a look?

best regards, Jie

[height rvtests-pgen-rvtests-vcf]https://user-images.githubusercontent.com/26947455/36263783-f5bc014a-1238-11e8-9196-a34b26eb3b4c.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/zhanxw/rvtests/issues/55, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJohpXAAbjJbOKxGVSPyHlq1Dqfh2sZ-ks5tVEnGgaJpZM4SG_2T.

xchenscd commented 6 years ago

Hi, there is no “DS” available in the bgen file but only “GP” data in the latest UKBB v3 release. Is it possible that you can check your code for calculating the dosage using GP values in bgen file please ? Additionally, is it possible that you can add one more option for VCF using GP to calculate dosage in addition to current “DS” ? Thank you very much.

zhanxw commented 5 years ago

@xchenscd BGEN does not have DS nor GP if my memory is correct. Internally, the dosage-like genotypes are used for association tests.

Do you mean that calculate dosage from GP field in the VCF file?

zhanxw commented 5 years ago

@dajiangliu I think we have an answer for @jiehuang001 . Do you remember the answer/solution?

dajiangliu commented 5 years ago

I think that the difference from PLINK can be due to the fact that PLINK uses hard genotype calls as input. If you give it a BGEN, it will first internally convert it to hard genotype calls, which will lead to sizable differences from the analysis that uses dosages. We tried quite a few examples, where we manually calculate the association statistic in R and compare with RVTESTS, it looks concordant. I suspect that the difference is due to use of hard genotype calls and dosage, at least from what we saw. If you have an example that show the difference otherwise, please let us know and we will debug. Thank you!