weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
187 stars 72 forks source link

Memory Issue for SAIGE-GENE step 0 (create sparse GRM) using UKBB WES data #375

Closed ypnngaa-py closed 2 years ago

ypnngaa-py commented 2 years ago

Hi, I am creating a sparse GRM for UKB 200K WES data. Because this step need rare variants for random sampling, all 12M exome variants are included. The plink bed file is about 600GB. How much memory do I need to run this step?

The issue I encountered is that I submitted a job with four nodes, 200G memory, it got stuck in the stage where 'setgeno mark1' for 8 hours. No error message but not moving.

Any suggestions for memory or should I reduce the plink file size/number of variants?

Thanks!

aoxiang88 commented 2 years ago

Hi,

How about using the variants from the exon target regions? And further if you are interested in Caucasian only, you may want to include Caucasians only in your plink file.

I used Caucasian only and limited in the target region, I didn't meet such memory problem. Our server has 255G memory btw.

weizhouUMICH commented 2 years ago

Sorry fo the late reply! We have just released a new version 1.0.0. It has computational efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/

There are different ways to estimate the sparse GRM. https://saigegit.github.io//SAIGE-doc/docs/createSparseGRM.html The program will be maintained by multiple SAIGE developers there. The docker image has been updated. Please feel free to try the version 1.0.0 and report issues if any.

Thanks! Wei