privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
190 stars 44 forks source link

R Session Aborted Fatal error with snp_grid_clumping() #85

Closed ghost closed 4 years ago

ghost commented 4 years ago

Hi Florian, I have been using snp_grid_clumping() without any problems until recently RStudio returned a fatal error after running snp_grid_clumping() for a few hours. My machine has 16GB memory and the .bk file of the data is 151GB. I was wondering if this is due of lack of memory? I am using bigsnpr version 1.3. Thanks, Kevin

privefl commented 4 years ago

Thanks for reporting.

Was it okay with version 1.2.6?

How many cores are you using?

I'm a bit surprised as v1.3 should be using less memory as it is processing each chromosome separately. But, the parallelization now uses OpenMP which may causes new issues.

ghost commented 4 years ago

I am using 4 cores as given by nb_cores() for my machine.

I just got this new data recently so I didn't have a chance to test if it was okay with v1.26 since I have upgraded to v1.3 already as it requires less memory.

I have used v1.3 for a few dataset and it has been running okay, this is the first time I got the error.

privefl commented 4 years ago

I guess 16 GB is not much.

I'll try on the cluster I use by asking for only e.g. 10 GB RAM.

privefl commented 4 years ago

On data 360K x 1.1M (377 GB .bk file), if I ask for 10 GB RAM and 16 cores, it ran in less than 50 min (using a subset of 10K indivs through parameter ind.row).

Can you try running it again?

ghost commented 4 years ago

Thanks for the response. I will try running it again.

ghost commented 4 years ago

I tried running it again the process was killed after a while. The data is 10K x 15M (151GB .bk file). There are 5K individuals in ind.row, I was wondering the issue is due to too many SNPs?

privefl commented 4 years ago

Yeah, this function caches the values of the correlations computed during the clumping step. This corresponds to a very very sparse correlation matrix; But if you have say 1.5M variants on chromosome 1, there can still be a lot of values to store.

You should probably filter that a bit or ask for more memory. You could filter for pval < 0.5 to keep only ~half.

ghost commented 4 years ago

Thanks Florian. After filtering SNPs, the function runs without problems now. I am now closing this issue. Thanks again for your detailed response.