shz9 / magenpy

Modeling and Analysis of (Statistical) Genetics data in python
https://shz9.github.io/magenpy/
MIT License
16 stars 5 forks source link

Error computing windowed LD on a plink bfile #15

Open nlapier2 opened 1 month ago

nlapier2 commented 1 month ago

Hello,

I have a plink bfile from the UK Biobank

$ ls bfile_chr22.*
bfile_chr22.bed  bfile_chr22.bim  bfile_chr22.fam  bfile_chr22.log

...and I am trying to create an LD matrix for it:

import magenpy as mgp
gdl = mgp.GWADataLoader('bfile_chr22', backend='plink')
gdl.compute_ld('windowed', output_dir='ld', kb_window_size=1000)

...but this causes an error:

>>> gdl.compute_ld('windowed', output_dir='ld', kb_window_size=1000)
> Computing LD matrix...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/gpfs/data/xhe-lab/nlapier2/mvmr/karjalainen_metabolite_gwas/viprs_env/lib/python3.12/site-packages/magenpy/GWADataLoader.py", line 661, in compute_ld
    c: g.compute_ld(estimator,
       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/data/xhe-lab/nlapier2/mvmr/karjalainen_metabolite_gwas/viprs_env/lib/python3.12/site-packages/magenpy/GenotypeMatrix.py", line 382, in compute_ld
    return ld_est.compute(output_dir,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
(shortening here)
  File "/gpfs/data/xhe-lab/nlapier2/mvmr/karjalainen_metabolite_gwas/viprs_env/lib/python3.12/site-packages/pandas/io/common.py", line 765, in get_handle
    handle = gzip.GzipFile(  # type: ignore[assignment]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nlapier2/project-xhe/miniconda3/lib/python3.12/gzip.py", line 192, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/gpfs/data/xhe-lab/nlapier2/mvmr/karjalainen_metabolite_gwas/temp/temp/ld_wlunenka/chr_22.ld.gz'

Appreciate any help!

shz9 commented 1 month ago

Are you running this on a shared compute cluster? I suspect that the plink process got killed due to lack of resources (e.g. out of memory error, storage space, etc.). How many samples/variants are there in this BED file?

In the latest version of magenpy, I added ways to detect these errors that arise on plink's side, but it's not always perfect.