Fix major memory leaks in the adapter

fsimonis commented 2 years ago

This PR attempts to fix 2 memory leaks.

During the initialization, the triangle array is allocated multiple times, leaking the complete buffer twice.
During the writeData call, the sink temperature and heat transfer coefficients are extracted via a single call, but freed only if they were configured as write data. Meaning, that this could lead to a memory leak of numElementsx doubles each timestep.

Handing over to @boris-martin for the testing.

boris-martin commented 2 years ago

On my side, reproducing the issue has been embarassingly painful. I compiled the adapter without and with the fix then compared output of heaptrack and valgrind, but for some reasons heaptrack doesn't spot the expected leaks (on simulations where issues are expected, like the heat exchanger tutorial), and I got some issues with valgrind who, apart from being painfully slow, likes to crash. If anyone is more successful with me it'd be great to share some confirmation of improvement, as I'm going in circle. Suggestions about other approaches would be welcome too.

However I'm pretty confident in the changes in the code and they theoretically should be correct. And at least, if I didn't spot a reduction in memory usage, everything works well so there doesn't seem to be any regression or new error. :)

Also, outside of the leak, in the case where HEAT_TRANSFER_COEFF is configured before the SINK_TEMPERATURE, I expected a crash (in the code before the fix, of course), but somehow it didn't happen, which I found very surprising. Nice to fix this potential bug "for free" anyway!

fsimonis commented 2 years ago

Were you able to reproduce the connectivity-related leak?

MakisH commented 2 years ago

You can check the maximum memory usage, e.g., with /usr/bin/time ./run.sh.

boris-martin commented 2 years ago

So, I made some measurements on the partitioned beam (which isn't a simulation known to crash because of leaks, but the simplest to test).

I compared adapter with 2.19 VS 2.19 + this PR merged (locally). I always used 2 threads for both participants on a VM with 4 threads. I also reduced the simulation length to 20 steps (instead of 50).

Aside from memory:

Correctness seems kept.
Merging this speeds up significantly the simulation (100s vs 120s)

(That's probably enough to make a merge justified)

I also ran the tutorial with heaptrack, one participant having the fix and the other not, for comparison. Both show similar peak memory consumption and identical leaks (12.8 kB). However the fixed version (left on the screenshot) indicates much higher number of allocations. @fsimonis does it seem normal to you ?

As for the time command it gave me the following outputs:

Without fix: 7.60user 14.51system 1:48.79elapsed 20%CPU (0avgtext+0avgdata 37656maxresident)k 0inputs+2400outputs (0major+69728minor)pagefaults 0swaps
With fix: 8.21user 17.23system 1:57.94elapsed 21%CPU (0avgtext+0avgdata 38196maxresident)k 0inputs+2432outputs (0major+74623minor)pagefaults 0swaps

Which is not very readable to me but I guess the important part is the max memory consumption.

MakisH commented 2 years ago

Running the elastic-tube-3d (which uses nearest-projection mapping) for 5 coupling time windows with the latest develop (CalculiX 2.19):

OpenFOAM: 165MB 38.06user 0.87system 4:48.42elapsed 13%CPU (0avgtext+0avgdata 165308maxresident)k 1904inputs+4832outputs (26major+28902minor)pagefaults 0swaps
CalculiX: 1146MB 211.51user 2.55system 4:30.29elapsed 79%CPU (0avgtext+0avgdata 1146880maxresident)k 664inputs+46080outputs (9major+307780minor)pagefaults 0swaps

Running with this PR (CalculiX 2.16):

OpenFOAM: 165MB 35.88user 0.51system 4:09.29elapsed 14%CPU (0avgtext+0avgdata 164520maxresident)k 960inputs+4824outputs (29major+28731minor)pagefaults 0swaps
CalculiX: 1146MB 197.59user 1.92system 4:05.77elapsed 81%CPU (0avgtext+0avgdata 1146016maxresident)k 0inputs+45872outputs (2major+307182minor)pagefaults 0swaps

No change, but the PR still seems to make sense.

MakisH commented 2 years ago

Running this PR for shorter time (3 time windows), gives:

OpenFOAM: 165MB 24.60user 0.43system 2:46.17elapsed 15%CPU (0avgtext+0avgdata 165056maxresident)k 0inputs+4768outputs (2major+28765minor)pagefaults 0swaps
CalculiX: 1146MB 130.11user 1.73system 2:41.99elapsed 81%CPU (0avgtext+0avgdata 1146632maxresident)k 0inputs+37912outputs (2major+306962minor)pagefaults 0swaps

Which is the same as for 5 time windows with PR, and the same as for 5 time windows with the original code.

All tests are with GCC 11.2.

boris-martin commented 2 years ago

Similar tests on my side with 2.19 VS 2.19 + this PR. Strangely I got only about 160MB with both versions, unlike Makis 🙈

fsimonis commented 2 years ago

These leaks are triggered in very specific cases only. This PR simplifies the code though, which is already a good reason to merge it.

MakisH commented 2 years ago

One difference between my observations and the ones of @boris-martin is that I am using preCICE built from source in Debug mode, while @boris-martin probably used the Debian package in release mode.

boris-martin commented 2 years ago

That's a good angle to investigate, but I actually used a from-source build yesterday. Might be worth to do a comparison to be sure.

MakisH commented 2 years ago

Something we have not yet tried is comparing to a much older state (we compared to develop, not to master). I can try that, again with elastic-tube-3d and with preCICE in release mode. If not, let's investigate in the workshop together with users. I will write on Discourse regarding this.

fsimonis commented 2 years ago

The one-time memory leak only affects the extraction of connectivity information from a tetrahedral interface. You can use a debugger and break on this function:

void PreciceInterface_ConfigureTetraFaces( PreciceInterface * interface, SimulationData * sim )

The recurring memory leak only affects simulations that define HEAT_TRANSFER_COEFF before SINK_TEMPERATURE in the configuration.

// This doesn't leak
Sink-Temperature-0
Heat-Transfer-Coefficient-0

// This leaks
Heat-Transfer-Coefficient-0
Sink-Temperature-0

MakisH commented 2 years ago

I repeated my checks, comparing the preCICE distributions v2202 and v2104. I did not observe any significant difference in the maximum memory usage for the elastic-tube-3d and the heat-exchanger. Something that did improve, though, is that the order of defining Sink-Temperature and Heat-Transfer-Coefficient before did matter: defining Heat-Transfer-Coefficient first was immediately leading to a segmentation fault. This does not seem to be the case anymore.

Here is my data: memory-leak.md

I did not include a case with nearest projection + CHT in my measurements.

precice / calculix-adapter

Fix major memory leaks in the adapter #77