petrelharp / ftprime_ms

4 stars 2 forks source link

Flagging: potential issues with large sims in simupop #5

Closed ashander closed 7 years ago

ashander commented 7 years ago

e.g. trying something similar to @jeromekelleher 's benchmarks over in https://github.com/molpopgen/fwdpy11_arg_example/pull/8 I run into the issue below. I'm just flagging this here for now as I need to look into it a bit more closely

 $ python3 benchmark-simuPOP.py -N 10000 --theta 10000 --rho 10000 --nsam 1000 --pdel 0.01 --seed 42 -G 1000 --csvfile out.csv
simuPOP Version 1.1.8.3 : Copyright (c) 2004-2016 Bo Peng
Revision 4553 (Feb 11 2017) for Python 3.5.3 (64bit, 1thread)
Random Number Generator is set to mt19937 with random seed 0xc14a65fcaab1501e.
This is the standard short allele version with 256 maximum allelic states.
For more information, please visit http://simupop.sourceforge.net,
or email simupop-list@lists.sourceforge.net (subscription required).
Options:
Namespace(csvfile='out.csv', gamma_scale=5.0, gamma_shape=1.0, generations=200000, logfile='-', mut_rate=1e-07, nsamples=1000, pdel=0.01, popsize=10000, recomb_rate=1e-07, record_neutral=False, rho=10000, seed=42, simplify_interval=1000, theta=10000, treefile=None)
16:02:11 10/19/17 PDT
----------
Traceback (most recent call last):
  File "benchmark-simuPOP.py", line 207, in <module>
    infoFields=['ind_id', 'fitness'])
  File "/home/jaime/miniconda3/envs/ftprime-benchmark/lib/python3.5/site-packages/simuPOP/simuPOP_std.py", line 3333, in __init__
    _simuPOP_std.Population_swiginit(self, _simuPOP_std.new_Population(*args, **kwargs))
RuntimeError: Failed to create population (popSize=10000, totNumLoci*ploidy=5000000, maximum population size for such a long genome=3689348814741, requested memory=48828515k bytes)
ashander commented 7 years ago

For reference, this is creating a Population in simuPOP with:

I guess the error message above could be more informative but memory demands are detailed here. The upshot is the above (with 2500000 short, diploid alleles) requires:

memory size(GB) = (2 2500000 2 + 56) / (1024 1024 1024) * popsize

so 93 GB. I was running on a machine with 20 so not surprising it won't work.

petrelharp commented 7 years ago

This means that simupop actually stores each allele for each individual, even if it is monomorphic? Wow...

On Thu, Oct 19, 2017 at 8:24 PM, ashander notifications@github.com wrote:

For reference, this is creating a Population in simuPOP with:

  • popsize: 10000
  • nloci: 2500000
  • number of locus positions: 2500000

I guess the error message above could be more informative but memory demands are detailed here http://simupop.sourceforge.net/Main/FAQ#toc4. The upshot is the above (with 2500000 short, diploid alleles) requires:

memory size(GB) = (2 2500000 2 + 56) / (1024 1024 1024) * popsize

so 93 GB. I was running on a machine with 20 so not surprising it won't work.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/petrelharp/ftprime_ms/issues/5#issuecomment-338095963, or mute the thread https://github.com/notifications/unsubscribe-auth/AA_26Yar_cUiWuO0qY57i_JsmlqONwNFks5suBJ_gaJpZM4P_-n_ .

ashander commented 7 years ago

Yes unclear if it stores them all initially or just allocs enough to do so. But it does use the memory (see below for an example that uses all of my 19208424 kb of memory).

I wonder if there's some way to avoid this?


```sh
 $ cat mem.py 
import simuPOP as sim
import argparse

parser = argparse.ArgumentParser(description="See how much memory")
parser.add_argument("--popsize","-N", type=int, dest="popsize",
        help="size of subpopulation",default=100)
args = parser.parse_args()
nloci = int(2.5e6)
locus_position = list(range(0, nloci))

pop = sim.Population(
        size=[args.popsize]*1,
        loci=nloci,
        lociPos=locus_position,
        infoFields=['ind_id', 'fitness'])
 $ /usr/bin/time -v python mem.py -N 5000
simuPOP Version 1.1.8.3 : Copyright (c) 2004-2016 Bo Peng
Revision 4553 (Feb 11 2017) for Python 3.5.3 (64bit, 1thread)
Random Number Generator is set to mt19937 with random seed 0xc11df7d466d1a291.
This is the standard short allele version with 256 maximum allelic states.
For more information, please visit http://simupop.sourceforge.net,
or email simupop-list@lists.sourceforge.net (subscription required).
        Command being timed: "python mem.py -N 5000"
        User time (seconds): 1.52
        System time (seconds): 12.39
        Percent of CPU this job got: 61%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:22.56
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 19208424
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 5473
        Minor (reclaiming a frame) page faults: 5835923
        Voluntary context switches: 5771
        Involuntary context switches: 68999
        Swaps: 0
        File system inputs: 234408
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
jeromekelleher commented 7 years ago

If it doesn't actually use the memory you could try adding a whole bunch of swap space. The mallocs will them succeed, and if the pages aren't touched it should be OK. If they are your system will descent into a horrible swapping mess pretty quickly. I wouldn't be too confident to be honest...

ashander commented 7 years ago

Some exploration with @molpopgen indicates we can up our allowed population size for a given amount of memory by 8x by doing:

from simuOpt import setOptions

setOptions(alleleType = 'binary')
import simuPOP as sim
petrelharp commented 7 years ago

Let's not worry about it too much - our goal is to describe the feasible, out-of-the-box space with simuPOP, not be heroic about what we can get it to do.

On Fri, Oct 20, 2017 at 11:14 AM, ashander notifications@github.com wrote:

Some exploration with @molpopgen https://github.com/molpopgen indicates we can up our allowed population size for a given amount of memory by 8x by doing:

from simuOpt import setOptions

setOptions(alleleType = 'binary') import simuPOP as sim

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petrelharp/ftprime_ms/issues/5#issuecomment-338283375, or mute the thread https://github.com/notifications/unsubscribe-auth/AA_26bpZAKLtjbA4CKNLaJGE7pE_5iXCks5suOL3gaJpZM4P_-n_ .

ashander commented 7 years ago

Yes. Seems like the right principle to adopt. We are no longer hunting for further optimization, but we do have our simuPOP benchmark running on @molpopgen 's server now!

ashander commented 6 years ago

for the record we later remembered alleleType='mutant' is the right thing to do here