privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

Handling of Dosage data in PRS calculation #30

Closed choishingwan closed 5 years ago

choishingwan commented 5 years ago

Hi there, I'd like to ask if the use of snp_PRS() is compatible with BGEN file input? And if so, is the dosage data hard-coded?

Thanks

privefl commented 5 years ago

If you want to use snp_PRS(), you need first to convert data to my format bigSNP. If you have BGEN input (with the same format as the UKBB), you can use bigsnpr::snp_readBGEN() to do so. It will convert probabilities to dosages (rounded to 2 decimal places).

Example code: https://github.com/privefl/UKBiobank/blob/master/UKB1-height.R

choishingwan commented 5 years ago

So the PRS will be calculated using the expected value (Sum Probability * Genotype_i), correct?

privefl commented 5 years ago

Yes

choishingwan commented 5 years ago

Thank you

privefl commented 5 years ago

You can now sample BGEN probabilities as hard calls if you prefer.

privefl commented 2 years ago

Note that function snp_prodBGEN() has been added in v1.8.6 to compute a matrix product between BGEN files and a matrix (or a vector). This removes the need to read an intermediate FBM object with snp_readBGEN() to compute the product. Moreover, when using dosages, they are not rounded to two decimal places anymore.