Closed ghost closed 4 years ago
Thanks for reporting.
Was it okay with version 1.2.6?
How many cores are you using?
I'm a bit surprised as v1.3 should be using less memory as it is processing each chromosome separately. But, the parallelization now uses OpenMP which may causes new issues.
I am using 4 cores as given by nb_cores() for my machine.
I just got this new data recently so I didn't have a chance to test if it was okay with v1.26 since I have upgraded to v1.3 already as it requires less memory.
I have used v1.3 for a few dataset and it has been running okay, this is the first time I got the error.
I guess 16 GB is not much.
I'll try on the cluster I use by asking for only e.g. 10 GB RAM.
On data 360K x 1.1M (377 GB .bk file), if I ask for 10 GB RAM and 16 cores, it ran in less than 50 min (using a subset of 10K indivs through parameter ind.row
).
Can you try running it again?
Thanks for the response. I will try running it again.
I tried running it again the process was killed after a while.
The data is 10K x 15M (151GB .bk file). There are 5K individuals in ind.row
, I was wondering the issue is due to too many SNPs?
Yeah, this function caches the values of the correlations computed during the clumping step. This corresponds to a very very sparse correlation matrix; But if you have say 1.5M variants on chromosome 1, there can still be a lot of values to store.
You should probably filter that a bit or ask for more memory. You could filter for pval < 0.5 to keep only ~half.
Thanks Florian. After filtering SNPs, the function runs without problems now. I am now closing this issue. Thanks again for your detailed response.
Hi Florian, I have been using snp_grid_clumping() without any problems until recently RStudio returned a fatal error after running snp_grid_clumping() for a few hours. My machine has 16GB memory and the .bk file of the data is 151GB. I was wondering if this is due of lack of memory? I am using bigsnpr version 1.3. Thanks, Kevin