Running genome-wide - Githubissues

nlapier2 commented 3 months ago

Hello,

What is the recommended procedure for running VIPRS genome-wide? In the Getting Started example, we run it on a single locus. More generally, it seems that we provide a single LD matrix and summary statistics. But usually we'll have different LD matrices for each chromosome. Is it recommended that we run VIPRS for each chromosome separately and then combine the results afterwards?

Thanks!

shz9 commented 3 months ago

In principle, you can run VIPRS genome-wide by just passing it the data for all chromosomes simultaneously (the GWADataLoader object has to contain LD + sumstats data for all chromosomes in this case). However, in standard scenarios, there's negligible correlations (LD) between variants on different chromosomes. Because of this, VIPRS was designed to perform inference over each chromosome separately.

I don't think we will gain much by performing inference genome-wide (at least not without considering inter-chromosomal LD). So, my recommendation is to perform inference over each chromosome separately and then combine the effect sizes afterwards. This is the standard approach that we implemented in the CLI script viprs_fit:

https://github.com/shz9/viprs/blob/master/bin/viprs_fit

nlapier2 commented 3 months ago

That makes sense -- thank you.

shz9 / viprs

Running genome-wide #7