shz9 / magenpy

Modeling and Analysis of (Statistical) Genetics data in python
https://shz9.github.io/magenpy/
MIT License
16 stars 5 forks source link

issues about LD matrices #16

Closed biostatShao closed 2 days ago

biostatShao commented 1 month ago
          Hi Zhonghe,

Thanks! We're working on adding this feature as part of a new release of the viprs software. We will also be releasing LD matrices for 6 continental populations represented in the UK Biobank in the next couple of weeks.

Do you mind opening a separate issue (here or under viprs) about the issues you're having with the shrinkage estimator? Which versions of the magenpy/viprs are you using? Which data did you compute the LD matrix from? How did you go about running viprs? All of these details can help us improve the software.

Thanks,

Shadi

Originally posted by @shz9 in https://github.com/shz9/magenpy/issues/10#issuecomment-2161174513

biostatShao commented 1 month ago

When using Command Line Scripts magenpy_ld, I encounter different issues depending on the backend configuration.

If I use the default backend xarry, I get the error "BLAS : Program is Terminated. Because you tried to allocate too many memory regions."

If I switch the backend to plink1.9, I encounter the following traceback: /home/zhshao/anaconda3/envs/py3/lib/python3.7/site-packages/scipy/sparse/_index.py:125: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. self._set_arrayXarray(i, j, x) Traceback (most recent call last): File "/home/zhshao/anaconda3/envs/py3/bin/magenpy_ld", line 173, in g.compute_ld(args.estimator, args.output_dir, **ld_kwargs) File "/home/zhshao/anaconda3/envs/py3/lib/python3.7/site-packages/magenpy/GenotypeMatrix.py", line 260, in compute_ld return ld_est.compute(output_dir, temp_dir=tmp_ld_dir.name) File "/home/zhshao/anaconda3/envs/py3/lib/python3.7/site-packages/magenpy/stats/ld/estimator.py", line 171, in compute temp_dir) File "/home/zhshao/anaconda3/envs/py3/lib/python3.7/site-packages/magenpy/stats/ld/estimator.py", line 90, in compute if _validate_ld_matrix(ld_mat): File "/home/zhshao/anaconda3/envs/py3/lib/python3.7/site-packages/magenpy/stats/ld/utils.py", line 45, in _validate_ld_matrix raise ValueError(f"Invalid LD Matrix: Element {i} does not have matching LD boundaries!") ValueError: Invalid LD Matrix: Element 0 does not have matching LD boundaries!

I am encountering these issues with both magenpy v0.0.12 and v0.0.11. How can I resolve this?

shz9 commented 1 month ago

Hi Zhonghe,

The xarray backend is not memory-efficient, so I don't recommend using it to compute very large LD matrices.

Regarding the error with plink1.9, please upgrade to magenpy>0.1. This issue has been resolved in the latest versions of the package.