requires too much of memory after EM

shajoezhu commented 9 years ago

Testing

../../smcsmc/smcsmc -nsam 2 -Np 1000 -EM 40 -t 30000 -r 6000 30000000 -p "1*3+15*4+1" -tmax 4 -seg sim-1Samples2msdata1.seg -o Particle1000Seqlen30000000_normalRecombTASK9 -seed 9 9 9 -xr 1-17 -xc 1-17

this still require a lot of the memory even when neither recombination events and coalescent events are recorded. so, I guess at every EM step, when a particle is removed, there are still things left, and have not been cleared. This was the cause on the cluster, the program was terminated because its consumption to memory.

shajoezhu commented 9 years ago

memory stays the same, when

./smcsmc -Np 2000 -EM 20 -xr 1 -xc 1

resampling is causing memory growing...

./smcsmc -Np 2000 -EM 20 -xr 1 -xc 1 -ESS 1

gerton commented 9 years ago

A quick test is to run a (small) run using valgrind memcheck; that reports who allocated the memory and may give a hint what goes wrong.

BW G

On 10 Oct 2015, at 12:03, Joe Zhu notifications@github.com wrote:

Testing

../../smcsmc/smcsmc -nsam 2 -Np 1000 -EM 40 -t 30000 -r 6000 30000000 -p "1_3+15_4+1" -tmax 4 -seg sim-1Samples2msdata1.seg -o Particle1000Seqlen30000000_normalRecombTASK9 -seed 9 9 9 -xr 1-17 -xc 1-17 this still require a lot of the memory even when neither recombination events and coalescent events are recorded. so, I guess at every EM step, when a particle is removed, there are still things left, and have not been cleared. This was the cause on the cluster, the program was terminated because its consumption to memory.

— Reply to this email directly or view it on GitHub https://github.com/luntergroup/smcsmc/issues/30.

shajoezhu commented 9 years ago

Hi Gerton

Yes, I am working on this, it is from copying new particle,

valgrind --leak-check=full ./smcsmc -Np 10 -EM 2 -ESS 1

==24378== 
==24378== HEAP SUMMARY:
==24378==     in use at exit: 22,221,136 bytes in 198,403 blocks
==24378==   total heap usage: 2,330,027 allocs, 2,131,624 frees, 33,126,244,641 bytes allocated
==24378== 
==24378== 1,568 bytes in 14 blocks are possibly lost in loss record 1 of 3
==24378==    at 0x4C2C100: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24378==    by 0x422C19: NodeContainer::NodeContainer(NodeContainer const&) (node_container.cc:54)
==24378==    by 0x4296E1: Forest::Forest(Forest const&) (forest.cc:87)
==24378==    by 0x408C3A: ForestState::ForestState(ForestState const&) (particle.cpp:54)
==24378==    by 0x40CD3B: ParticleContainer::resample(std::valarray<int>&) (particleContainer.cpp:109)
==24378==    by 0x40D04F: ParticleContainer::ESS_resampling(std::valarray<double>, std::valarray<int>&, int, double, int) (particleContainer.cpp:75)
==24378==    by 0x44F44B: pfARG_core(PfParam&, CountModel*, bool) (smcsmc.cpp:205)
==24378==    by 0x404491: main (smcsmc.cpp:74)
==24378== 
==24378== 22,219,568 (4,028,304 direct, 18,191,264 indirect) bytes in 35,967 blocks are definitely lost in loss record 3 of 3
==24378==    at 0x4C2C100: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24378==    by 0x422C19: NodeContainer::NodeContainer(NodeContainer const&) (node_container.cc:54)
==24378==    by 0x4296E1: Forest::Forest(Forest const&) (forest.cc:87)
==24378==    by 0x408C3A: ForestState::ForestState(ForestState const&) (particle.cpp:54)
==24378==    by 0x40CD3B: ParticleContainer::resample(std::valarray<int>&) (particleContainer.cpp:109)
==24378==    by 0x40D04F: ParticleContainer::ESS_resampling(std::valarray<double>, std::valarray<int>&, int, double, int) (particleContainer.cpp:75)
==24378==    by 0x44F44B: pfARG_core(PfParam&, CountModel*, bool) (smcsmc.cpp:205)
==24378==    by 0x404491: main (smcsmc.cpp:74)
==24378== 
==24378== LEAK SUMMARY:
==24378==    definitely lost: 4,028,304 bytes in 35,967 blocks
==24378==    indirectly lost: 18,191,264 bytes in 162,422 blocks
==24378==      possibly lost: 1,568 bytes in 14 blocks
==24378==    still reachable: 0 bytes in 0 blocks
==24378==         suppressed: 0 bytes in 0 blocks
==24378== 
==24378== For counts of detected and suppressed errors, rerun with: -v
==24378== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

shajoezhu commented 9 years ago

bug is found in scrm, copy constructor for node_container, there are also some minor bugs in scrm, I am fixing them.

luntergroup / smcsmc

requires too much of memory after EM #30