privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

Not enough memory for loading bgen file #297

Closed garyzhubc closed 2 years ago

garyzhubc commented 2 years ago

Not enough memory for loading bgen file. My bgen file has size 4G, and I have allocated 125G memory to do the processing. How much memory do I need? I though this library is gonna cache the big file?

garyzhubc commented 2 years ago

There are only 40359612 snp_id and 632 samples. I thought 125G is enough for this.

privefl commented 2 years ago

Yes, this should be more than enough.

What is the error you get exactly?

bvilhjal commented 2 years ago

Dear Peiyuan,

Am I correctly understanding that you're using ~40 million variants? For better computational performance and more accurate PRS, I would recommend you restrict to hapmap3 SNPs or other high quality SNP set that also excludes rare SNPs. Since you don't have many samples I would also use a stringent MAF threshold of about 1% or greater.

Best, Bjarni

garyzhubc commented 2 years ago

There are only 40359612 snp_id and 632 samples. I thought 125G is enough for this.

This is the error I got

> chr <- snp_readBGEN("chr.bgen","chr",list(snp_id))
Error: out of memory

I'm running this on a cluster with 125G memory and 32 CPUs

garyzhubc commented 2 years ago

It worked after following Bjarni's suggestion, but I doubt it's really a memory issue, because I for sure have enough memory even including the rare variants

privefl commented 2 years ago

Maybe the problem comes from storing the $map information. How large is the object right now? (object.size()) Could you please extrapolate this number for the full number of variants?

garyzhubc commented 2 years ago

I cannot load the object in, because it says out of memory, so I can't really read the full variant. The current object is only of size 1G.

garyzhubc commented 2 years ago

Gonna close this because I'll just use the one pruned by maf