matsen / pplacer

Phylogenetic placement and downstream analysis
http://matsen.fredhutch.org/pplacer/
GNU General Public License v3.0
74 stars 18 forks source link

more RAM will be required when use more CPUs? #369

Open SilentGene opened 5 years ago

SilentGene commented 5 years ago

Hi, while I understand pplacer requires a lot of memory because it caches likelihood vectors for all of the internal nodes, I'm still confused why it needs more memory when I try to use more CPUs (only tested on Linux). It looks like when I specify a double number of CPUs, the memory requirements will be also doubled, which doesn't make sense. Anyone knows how to deal with this problem? It really matters when trying to work with a huge reference tree. Thanks.

aaronmussig commented 3 years ago

I've done some digging to understand what is happening here, but from my understanding (and correct me if I'm wrong): it doesn't.

It appears that the main thread forks, which uses Unix.fork. Since the memory is copy-on-write, the children only have a mapping to main thread memory space.

However, this is not what is reported in the OS. From my experience, as long as you've got enough memory to run the main thread, you can launch as many children as you want (so long as it's under 64 as I've had instances of it hanging if more are used).