ternaus / quest-qmc

Automatically exported from code.google.com/p/quest-qmc
2 stars 11 forks source link

Memory leak when QUEST performs measurements. #21

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
mkl-icc

FC_FLAGS = -m64 -warn all -unroll -O3

To measure time consumption I used:

/usr/bin/time -v ./ggeom L40_mu0.3_1388713277.77.in

Input and geometry file attached.

After 12 minutes 4.4Gb of RAM. 

-----

    Command being timed: "./ggeom L40_mu0.3_1388713277.77.in"
    User time (seconds): 753.46
    System time (seconds): 1.33
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 12:35.52
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 4402576
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 1100778
    Voluntary context switches: 13
    Involuntary context switches: 104463
    Swaps: 0
    File system inputs: 0
    File system outputs: 37720
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 1

Original issue reported on code.google.com by iglovi...@gmail.com on 3 Jan 2014 at 4:22

Attachments:

GoogleCodeExporter commented 9 years ago
We do not have any simple warnings that can give us a hint about, where problem 
is, so it is time to start using serious tools.

I hope this info will help.

mkl-icc

FC_FLAGS = -g -m64

valgrind --leak-check=full --show-leak-kinds=all -v ./ggeom in &>> ggeom.log

All log attached. It should point what and where leaks.

==================================

==10069== LEAK SUMMARY:
==10069==    definitely lost: 1,678,002 bytes in 1,196 blocks
==10069==    indirectly lost: 5,272,704 bytes in 3,582 blocks
==10069==      possibly lost: 0 bytes in 0 blocks
==10069==    still reachable: 800,412 bytes in 135 blocks
==10069==         suppressed: 0 bytes in 0 blocks
==10069== 
==10069== ERROR SUMMARY: 104 errors from 49 contexts (suppressed: 2 from 2)
====================================

Original comment by iglovi...@gmail.com on 3 Jan 2014 at 6:23

GoogleCodeExporter commented 9 years ago

Original comment by iglovi...@gmail.com on 3 Jan 2014 at 6:31

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by iglovi...@gmail.com on 4 Jan 2014 at 11:41

GoogleCodeExporter commented 9 years ago
[1] I want to add that problems start when QUEST tries to perform measurements.

When it does warm up sweeps I have: 

Virtual 104mb 
res  15mb

But when QUEST switches to the *measurements* regime memory use starts to grow.

Every *measurement* sweep increases this value.

[2] Removing BONDS and PAIR sections DOES NOT prevent from this leak.

[3] Removing PHASE section DOES NOT prevent from this leak.

[4] Removing SYMM section DOES NOT prevent from this leak.

[5] Using make.inc.icc DOES NOT prevent from this leak.

[6] Using make.inc.gcc DOES HELP! But this is slowest case. With respect to  
https://code.google.com/p/quest-qmc/wiki/BenchmarkCougar, This makes simulation 
3 times slower.

[7] I have problems compiling quest with make.inc.mkl-gcc, so I can not check 
this case.

Original comment by iglovi...@gmail.com on 6 Jan 2014 at 10:29

GoogleCodeExporter commented 9 years ago
Hi Vlad,

can you confirm that tdm=0 as in the posted input file?

Original comment by simone.c...@gmail.com on 6 Jan 2014 at 11:03

GoogleCodeExporter commented 9 years ago
[1] yes. tdm = 0

[2]  ifort -v
ifort version 14.0.0

Original comment by iglovi...@gmail.com on 6 Jan 2014 at 11:35

GoogleCodeExporter commented 9 years ago
Vlad, 

try removing -DDQMC_ASQRD as well and see
what it does. 

What are you doing to see the increase
in memory consumption ?

Thanks, Simone

Original comment by simone.c...@gmail.com on 6 Jan 2014 at 11:41

GoogleCodeExporter commented 9 years ago
I start simulation and in some other terminal window start program "top"

It shows cpu and memory consumption of all processes.

Original comment by iglovi...@gmail.com on 6 Jan 2014 at 11:46

GoogleCodeExporter commented 9 years ago
removing -DDQMC_ASQRD does not help

Original comment by iglovi...@gmail.com on 6 Jan 2014 at 11:50

GoogleCodeExporter commented 9 years ago
I just looked at the code. The difference between the measurement and 
equilibration
sweeps is the call (in ggeom.F90) to DQMC_Hub_Meas (when tdm=0: no 
time-dependent
measurements). So the problem is likely in there. There are a few  calls to 
DQMC_Gfun_Duplicate
that allocate new memory but it all seems to be correctly undone at the end, 
with
calls to DQMC_Gfun_Free. Need to look into it more closely.

Original comment by simone.c...@gmail.com on 7 Jan 2014 at 1:42

GoogleCodeExporter commented 9 years ago
OK. This issue should be fixed. What seemed to be correctly undone  was not:
athough DQMC_Gfun_Free was called, the code did nothing because the variable
owns_G was not set in DQMC_Gfun_Duplicate (and it defaulted to false which is 
wrong).

Vlad, please confirm this is the case and close the issue.

Original comment by simone.c...@gmail.com on 7 Jan 2014 at 2:58

GoogleCodeExporter commented 9 years ago
I confirm that this fixes the bug.

Thank you. You made me happy.

Original comment by iglovi...@gmail.com on 7 Jan 2014 at 3:29

GoogleCodeExporter commented 9 years ago

Original comment by iglovi...@gmail.com on 7 Jan 2014 at 3:30