shz9 / magenpy

Modeling and Analysis of (Statistical) Genetics data in python
https://shz9.github.io/magenpy/
MIT License
16 stars 5 forks source link

Replace LD matrix matching logic #7

Closed shz9 closed 2 years ago

shz9 commented 2 years ago

Currently, when we match LD matrices with GWAS summary statistics, a new matrix is created in temp_dir with only the subset of SNPs that are shared between the two data sources. This can be very slow and expensive for large LD matrices, potentially replicating 100s of GB of data unnecessarily.

As an alternative solution, we can instead generate a mask that ensures that the posterior estimates for SNPs that only exist in the LD matrix is set to zero by default. This mask would be generated in the matching step and would be passed down for downstream tasks, such as PRS model fitting.

Details TBD.

shz9 commented 2 years ago

Implemented in the latest iterations of LDMatrix.