rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
184 stars 54 forks source link

Is it possible to compute LD between two masks? #496

Closed scienception closed 6 months ago

scienception commented 8 months ago

For example between geneA.M1.0.01 and geneA.M2.0.01? Where the former, for example, contains synonymous variants and the other missense variants.

What are the flags to be used? I am guessing --compute-corr -ld-extract info.txt

But how does info.txt look like?

mask geneA.M1.0.01 geneA mask geneA.M2.0.01 geneA

joellembatchou commented 6 months ago

Hi,

Yes it's possible and your setup looks right (check doc). You'd use the same input flags as you would when computing burden masks (i.e. --set-list --anno --mask-def --aaf-bins).

Cheers, Joelle

scienception commented 6 months ago

Thank you! Could you please point me to where this is done in the code? I was looking at this script but it seems just parsing rather than Pearson correlation computation. https://github.com/rgcgithub/regenie/blob/master/scripts/parseLD.r

joellembatchou commented 6 months ago

check the documentation: https://rgcgithub.github.io/regenie/options/#ld-computation You would run this using the REGENIE software (that parseLD.r script is just to read the output binary file that contains the correlation information).

scienception commented 6 months ago

All right, I just wanted to know in which script the computation was done since the data isn't phased. Could you please send me the link to the script? I would appreciate it :) it is just for learning purposes. Thanks!

joellembatchou commented 5 months ago

https://github.com/rgcgithub/regenie/blob/master/src/Data.cpp#L4173