mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
109 stars 27 forks source link

MemoryError when using --lineage-clusters and --sequence-reweighting #122

Closed boasvdp closed 4 years ago

boasvdp commented 4 years ago

First of all thanks for this great tool. I am trying to replicate the elastic net prediction tutorial on a dataset for which we have previously performed GWAS, but I run into a memory error when explicitly correcting for lineage and using the sequence reweighting option.

The dataset consists of 1169 isolates with a binary phenotype. The elastic net is fitting to the top ~1.5 million variants, after filtering on allele frequency (0.05, 0.95). At the fitting step, I get this error message: MemoryError: Unable to allocate 1.17 GiB for an array with shape (1576016, 100) and data type float64 which can be traced back to numpy, I think. The thing is, this runs on a HPC computing node with 96 GB RAM and the output of /usr/bin/time -v indicates that the maximal RAM usage is ~30 GB (Maximum resident set size (kbytes): 30022628 if I interpret this correctly, see attached log). Without --lineage-clusters and --sequence-reweighting the run completes ok.

After looking online, this seems similar to https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type which has to do with memory overcommit handling. The accepted solution in the SO thread is to check /proc/sys/vm/overcommit_memory and change this if possible. This is set to 2, and /proc/sys/vm/overcommit_ratio is set to 99, which according to https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/ means I can only commit 99% of 96 GB RAM. To change these overcommit settings I require root permissions which I don't have.

Could this be the cause of the error I am seeing and if so, is there a way around this?

Many thanks in advance!

Log file: slurm_output.txt

johnlees commented 4 years ago

The memory use is unfortunately very high for this mode, and as you've observed is higher with sequence reweighting. We were able to fit the model to 3000 samples/1.6 million variants with 80Gb RAM, and 5000 samples/1.7million variants with 100Gb RAM. So this should be possible on your 96Gb node.

One first thing to check is how many threads/CPUs you're requesting. I'd keep it to one, as in the current version the memory isn't shared between threads (wasn't even possible until recently in python3.8) and every thread will make a new copy of the variant matrix, increasing memory use linearly with number of threads.

It does look like you're genuinely running out of memory, and I'd guess that failing a ~1Gb alloc is probably not due to the kernel refusing (though I don't know for sure!). Probably the solution if you're still running into this is to see if you can get access to a higher memory machine

boasvdp commented 4 years ago

I am currently keeping to a single thread, and will check if I can get access to a higher memory node. Many thanks for the quick reply!