shz9 / viprs

Variational Inference of Polygenic Risk Scores
https://shz9.github.io/viprs/
MIT License
19 stars 1 forks source link

Invalid LD Matrix: Element 0 does not have matching LD boundaries! #3

Closed xinyu-c9 closed 7 months ago

xinyu-c9 commented 1 year ago

Sorry to disturb! I was trying to construct a shrinkage LD matrix using my genotype file. Here is my script:

import magenpy as mgp
gdl = mgp.GWADataLoader("./LD_reference/plink/plink",
                backend='plink')
gdl.compute_ld(estimator='shrinkage',
                genetic_map_ne=11400,
                genetic_map_sample_size=183,
                output_dir='./LD_reference/EUR/VIPRS')

I encountered the following error:

> Reading BED file...
Computing LD matrices:   0%|                                      | 0/22 [00:00<?, ?it/s]/software/conda/envs/wdl/lib/python3.7/site-packages/scipy/sparse/_index.py:125: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  self._set_arrayXarray(i, j, x)
Traceback (most recent call last):
  File "VIPRS_EURLD.py", line 7, in <module>
    output_dir='./LD_reference/EUR/VIPRS')
  File "/software/conda/envs/wdl/lib/python3.7/site-packages/magenpy/GWADataLoader.py", line 563, in compute_ld
    disable=not self.verbose or len(self.genotype) < 2)
  File "/software/conda/envs/wdl/lib/python3.7/site-packages/magenpy/GWADataLoader.py", line 560, in <dictcomp>
    for c, g in tqdm(sorted(self.genotype.items(), key=lambda x: x[0]),
  File "/software/conda/envs/wdl/lib/python3.7/site-packages/ma
genpy/GenotypeMatrix.py", line 260, in compute_ld
    return ld_est.compute(output_dir, temp_dir=tmp_ld_dir.name)
  File "/software/conda/envs/wdl/lib/python3.7/site-packages/magenpy/stats/ld/estimator.py", line 227, in compute
    temp_dir)
  File "/software/conda/envs/wdl/lib/python3.7/site-packages/magenpy/stats/ld/estimator.py", line 90, in compute
    if _validate_ld_matrix(ld_mat):
  File "/software/conda/envs/wdl/lib/python3.7/site-packages/magenpy/stats/ld/utils.py", line 45, in _validate_ld_matrix
    raise ValueError(f"Invalid LD Matrix: Element {i} does not have matching LD boundaries!")
ValueError: Invalid LD Matrix: Element 0 does not have matching LD boundaries!
Computing LD matrices:   0%|                                      | 0/22 [01:07<?, ?it/s]

Do you know what caused this error, and how to fix it? If you need additional information, please feel free to ask me. Thank you so much in advance!

shz9 commented 1 year ago

Hi Xinyu,

Thanks for reporting this bug. I've seen this issue reported before, however, I was not able to reproduce the error message on my end. Do you mind sharing more information about your system setup? Primarily, it would be great to know the python version, magenpy version, and plink version. It would also help if you could show how you set the plink path (if at all) before running the script.

Thank you,

Shadi

xinyu-c9 commented 1 year ago

Hi Shadi,

Thanks for your reply! The Python version I used is 3.7.12, the Magenpy version is 0.0.12, and the Plink version is v1.90b6.24. For the full plink path, I'll send an email to you. Do you still use shadi.zabad@mail.utoronto.ca?

Thanks again for your help.

Best, Xingyu

shz9 commented 1 year ago

Hi Xingyu,

Thank you for following up on this. I think I figured out the source of the bug.

In my experimentation and code development, I was testing the LD computation functionality with plink version v1.90b4.6 64-bit (15 Aug 2017) and I believe if you use this version, you won't see this bug anymore. I think in later versions of plink, such as the one you're using, the developers changed the default value for one of the LD-related flags, which meant that the output of the software is now different, which then breaks my code.

Specifically, the flag that was modified is --ld-window-r2, which sets the threshold used to decide what LD values to include in the output file. In future versions of magenpy, I will try to set this flag explicitly to be zero (i.e. --ld-window-r2 0), but for now, if you want a quick solution to your problem, I recommend using the plink version that I mentioned above (or any plink 1.9 from before 2019, I believe?).

Hope this solves the problem. If it doesn't, please let me know.

xinyu-c9 commented 1 year ago

Hi Shadi,

I'm sorry for not getting back to you sooner. I've tried Plink v1.90b6.7 64-bit (2 Dec 2018), but this error still persists. This is the earliest version of plink v1.90 I can find. Could you please send me your Plink software so that I can retry? This is my email address: chenxy@big.ac.cn

Thank you so much for your help!

shz9 commented 7 months ago

Hi Xinyu, I pushed a large update to both magenpy and viprs that should fix these bugs that you reported. If the issue still persists, feel free to open the issue again.