jianyangqt / gcta

GCTA software
GNU General Public License v3.0
85 stars 26 forks source link

std::bad_alloc in REML analysis #36

Closed WeiCSong closed 3 months ago

WeiCSong commented 1 year ago

HI GCTA developer, Thanks for the great tool! I ran gcta --reml on UKB data with 64 cores and 512G memory and got the following error:

Accepted options: --reml --grm all ###all.grm.bin is 454k MB size --pheno all.phen --qcovar all.qcovar --thread-num 64 --out ukb_1 Note: the program will be running on 64 threads. Reading IDs of the GRM from [all.grm.id]. 488377 IDs are read from [all.grm.id]. terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

I noticed that in version 1.91 there was an update on this bug, so is there a solution for it now? Thank you very much for your help!

hailingfang commented 1 year ago

Which version of GCTA do you use? And where do you download it?

WeiCSong commented 1 year ago

Which version of GCTA do you use? And where do you download it?

Hi, I download gcta-1.94.1-linux-kernel-3-x86_64.zip from yanglab gcta/download page.

hailingfang commented 1 year ago

Which version of GCTA do you use? And where do you download it?

Hi, I download gcta-1.94.1-linux-kernel-3-x86_64.zip from yanglab gcta/download page.

what's the kernel version of your operation system? And would you mind share some data with me and so I can reproduce the issue?

WeiCSong commented 1 year ago

@benjaminfang

My linux kernel is 4.18.0-240.el8.x86_64. The grm.bin for 350k UKB samples is 454000MB large, and I'm trying to find a smaller case to reproduce the error, which would be easier to share. Currently 30k sample gave 1700MB grm.bin file, and did not cause this error on two cores. I'll reply to this issue when I found the smaller sample size that reproduce the error.

hailingfang commented 1 year ago

@benjaminfang

My linux kernel is 4.18.0-240.el8.x86_64. The grm.bin for 350k UKB samples is 454000MB large, and I'm trying to find a smaller case to reproduce the error, which would be easier to share. Currently 30k sample gave 1700MB grm.bin file, and did not cause this error on two cores. I'll reply to this issue when I found the smaller sample size that reproduce the error.

Thanks. And would you try this "https://yanglab.westlake.edu.cn/software/gcta/bin/gcta-1.94.1-linux-kernel-4-x86_64.zip" for me?

longmanz commented 1 year ago

Hi @WeiCSong, This is probably not a bug. Given your settings (memory = 512 GB), the GCTA-REML analysis is not applicable to datasets like the UKBB. Based on the memory requirement estimated at https://yanglab.westlake.edu.cn/software/gcta/index.html#FAQ , approximately 7-8 TB of memory is needed.

  1. If you want to run REML for UKBB, you may try BOLT-REML from BOLT-LMM (https://alkesgroup.broadinstitute.org/BOLT-LMM/BOLT-LMM_manual.html#x1-40001.2).
  2. Alternatively, you may try GCTA HE-regression instead of REML to estimate heritability (https://yanglab.westlake.edu.cn/software/gcta/index.html#Haseman-Elstonregression).
WeiCSong commented 1 year ago

@longmanz Thanks for your information! I guess that similar memory requirement also stands for OSCA? Possibily I need to down sample my dataset.

longmanz commented 1 year ago

Hi @WeiCSong, Not so sure about OSCA, but I think for any conventional REML based analysis for datasets like UKBB you will need a very large amount of memory.