rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
185 stars 55 forks source link

bad_alloc - not enough memory issue - REGENIE #433

Closed Roshinisundararajan closed 1 year ago

Roshinisundararajan commented 1 year ago

Hi, This is Roshini and I am currently working in a population genetics project. As a part of this project, we are trying to use the REGENIE pipeline. Initially I installed regenie from the condo environment and I was successfully able to run this pipeline for the vcf file of one chromosome - chromosome 22 for quantitative trait. But, for the all chromosomes genetic data I am facing this error : ERROR: bad_alloc caught, not enough memory (std::bad_alloc)- the job terminates after this. I even tried installing it my linux system to run the pipeline instead of the conda environment. I am facing this issue with both 3.2.7 and 3.2.9 versions of regenie. No. of samples = 503 No. of snps = 2026855 No. of covariates = 20 regenie \ --step 1 \ --force-step1 \ --bed (input allchr bed file) \ --exclude (list of variants to be excluded) \ --covarFile (text file having 20 covariates) \ --phenoFile (quantitative trait) \ --bsize 100 \ --qt \ --out fit_qt_out

OS - linux Centos 7.6.1810 Kernel 3.10.0-957 gcc-7.3.0 glibc-2.23

joellembatchou commented 1 year ago

Hi,

You are analyzing 2M variants with block size of 100 meaning you'd have about 100K level 0 ridge predictors which is causing the memory issue. Check the FAQ here for step 1: https://rgcgithub.github.io/regenie/faq/#step-1

You can also look at the recommendations for UKB analysis for some general guidelines on how to reduce number of variants for step 1: https://rgcgithub.github.io/regenie/recommendations/

Cheers, Joelle