precice / calculix-adapter

preCICE-adapter for the CSM code CalculiX
GNU General Public License v3.0
51 stars 20 forks source link

Check for memory leaks #10

Closed MakisH closed 2 years ago

MakisH commented 5 years ago

During tests for FSI with CalculiX and OpenFOAM, @derekrisseeuw noticed unreasonably high memory usage for CalculiX after a while, so there may be some memory leak in the adapter (or maybe in CalculiX itself).

At first, I would manually check if we delete every pointer we create. If this does not help, I would try to use Valgrind or a similar tool, although I would expect this to be a bit tricky in our case.

augustinhugues commented 5 years ago

I am experiencing the same issue. My calculation is crashing at some point with no apparent reason. If I look at the RAM usage it looks like it is increasing until reaching the maximum which might be the reason for the crash. I am trying right now to run the same case with Calculix Only to check wether it is CalculiX or the adapter which is consumming memory. I have 126GB of RAM. At the beginning of the calculation, I clear cache, and it uses only 10GB for the first time steps.

augustinhugues commented 5 years ago

valgind-out.txt

I use Valgind on ccx and got the folowwing file. stn variable in nonlingeo_precice.c is not freed, I've added SFREE(stn) line 2679, but it looks like it is not enough.

valgind-out.txt This second file also has the reachable blocks

nkr0 commented 5 years ago

@augustinhugues can you check if #16 clears this?

augustinhugues commented 5 years ago

@nkr1729 My bad, there was a problem in the calculation, CalculiX was not running, I re run and keep you update

nkr0 commented 5 years ago

Thanks, can you also check with valgrind?

augustinhugues commented 5 years ago

I checked with valgrind, there is only 47000bytes in use at exit against 5e6 so it is way better. On my case the memory usage is still increasing but it's hard to say if the increase speed is reduced as it takes abour 90h to crash.

valgrind-out_break.txt

nkr0 commented 5 years ago

@augustinhugues thank you for checking. The current output looks like the leaks are not from calculix (or preCICE). I've no idea why, but, every leak in the report says (in /bin/bash). Usually it should point to some file that is part of calculix or preCICE. And if the right flags are activated even a trace to the function from which the leak is coming. In your previous logs, we can see the trace going to nonlingeo_precice.c and ccx_2.15.c. However, there is nothing like that in this log.

xl305053 commented 5 years ago

Hello, I have saw the same problem.When I run a new case, the program "ccx_preCICE" consumes a lot of memory and continues to grow during the calculation until an error occurs. I am trying to run the same case with Calculix Only ,and CalculiX Only is not consumming memory.

augustinhugues commented 5 years ago

Hi @nkr1729, At least there are less leaks, but I don't know why it is only refering to /bin/bash either. I didn't even use a bash file to launch valgrind. Could it be valgrind itself ?

nkr0 commented 5 years ago

I was thinking the same. It doesn't look like it is from calculix. @KyleDavisSA I think we can close this issue with #16 .

precice-bot commented 3 years ago

This issue has been mentioned on preCICE Forum on Discourse. There might be relevant details there:

https://precice.discourse.group/t/calculix-adapter-memory-just-keeps-growing/444/5

mtree22 commented 3 years ago

This issue has been mentioned on preCICE Forum on Discourse. There might be relevant details there:

https://precice.discourse.group/t/calculix-adapter-memory-just-keeps-growing/444/5

This is my case. I'm currently running a stand-alone solid mechanics sim using almost the same CalculiX input deck using ccx_preCICE on its own. I simply replaced the fluid interaction with a constant pressure value.

I discussed with some colleagues and they suggested I try to compile with a bunch of additional flags before trying to use valgrind. So, I modified the Makefile to include bunch more CFLAGS and FLAGS and gave it a go. Here's what resulted:

make.log

I'm using v2.16, if that matters.

mtree22 commented 3 years ago

I re-compiled the version 2.16 calculix adapter with the following flags:

CFLAGS = -g -Wall -std=c++11 -O0 -fopenmp $(INCLUDES) -DARCH="Linux" -DSPOOLES -DARPACK -DMATRIXSTORAGE
FFLAGS = -g -Wall -O0 -fopenmp $(INCLUDES)

and then ran an FSI setup using preCICE 2.2.0 and the openfoam adpater. I wrapped Valgrind around both the calcuix and openfoam processes, and executed preCICE 2.2.0 in debug mode. The processes ran much slower and consumed much more memory (as expected with Valgrind), but the same behavior persisted. Eventually the ccx_preCICE-initiated process overflowed itself and crashed.

Here is the preCICE 2.2.0 debug output file: debug.log Here is the openfoam valgrind output: valgrind_38011.log Here is the calculix valgrind output: valgrind_62182.log Here is the openfoam terminal output: fluid.log Here is the calculix terminal output: solid.log

precice-bot commented 3 years ago

This issue has been mentioned on preCICE Forum on Discourse. There might be relevant details there:

https://precice.discourse.group/t/connection-never-accepted/482/3

precice-bot commented 3 years ago

This issue has been mentioned on preCICE Forum on Discourse. There might be relevant details there:

https://precice.discourse.group/t/large-memory-usage-while-running/616/3

ajaust commented 3 years ago

I just saw that everyone seems to use Valgrind's memcheck tool to trace memory that is not free'd correctly. I would suggest to maybe run one of the critical simulations using massif from the Valgrind toolbox, infos here, as it traces the memory usage and should also be able to tell you which function has allocated memory. Maybe this gives another hint on what is going on.

precice-bot commented 2 years ago

This issue has been mentioned on preCICE Forum on Discourse. There might be relevant details there:

https://precice.discourse.group/t/sudden-segmentation-violation-error-in-calculix/796/2

MakisH commented 2 years ago

@augustinhugues @xl305053 @mtree22 could you please check again with the latest version of the adapter (v2.19.0) for CalculiX v2.19 and report if the situation improved? There have recently been some contributions in this direction with #76 and #77 (and the already discussed #16).

Additionally, if anyone manages to reproduce this with any of the tutorials (even with some modifications), it would really help to resolve it.

You can measure the maximum memory consumption using, e.g., /usr/bin/time ./run.sh.

precice-bot commented 2 years ago

This issue has been mentioned on preCICE Forum on Discourse. There might be relevant details there:

https://precice.discourse.group/t/large-memory-usage-while-running/616/10

uekerman commented 2 years ago

It seems that we caught all leaks by now. At least, we cannot reproduce problems. Please re-open if you find reproducible leaks.