Closed fsimonis closed 2 years ago
On my side, reproducing the issue has been embarassingly painful. I compiled the adapter without and with the fix then compared output of heaptrack and valgrind, but for some reasons heaptrack doesn't spot the expected leaks (on simulations where issues are expected, like the heat exchanger tutorial), and I got some issues with valgrind who, apart from being painfully slow, likes to crash. If anyone is more successful with me it'd be great to share some confirmation of improvement, as I'm going in circle. Suggestions about other approaches would be welcome too.
However I'm pretty confident in the changes in the code and they theoretically should be correct. And at least, if I didn't spot a reduction in memory usage, everything works well so there doesn't seem to be any regression or new error. :)
Also, outside of the leak, in the case where HEAT_TRANSFER_COEFF is configured before the SINK_TEMPERATURE, I expected a crash (in the code before the fix, of course), but somehow it didn't happen, which I found very surprising. Nice to fix this potential bug "for free" anyway!
Were you able to reproduce the connectivity-related leak?
You can check the maximum memory usage, e.g., with /usr/bin/time ./run.sh
.
So, I made some measurements on the partitioned beam (which isn't a simulation known to crash because of leaks, but the simplest to test).
I compared adapter with 2.19 VS 2.19 + this PR merged (locally). I always used 2 threads for both participants on a VM with 4 threads. I also reduced the simulation length to 20 steps (instead of 50).
Aside from memory:
(That's probably enough to make a merge justified)
I also ran the tutorial with heaptrack
, one participant having the fix and the other not, for comparison. Both show similar peak memory consumption and identical leaks (12.8 kB). However the fixed version (left on the screenshot) indicates much higher number of allocations. @fsimonis does it seem normal to you ?
As for the time
command it gave me the following outputs:
7.60user 14.51system 1:48.79elapsed 20%CPU (0avgtext+0avgdata 37656maxresident)k 0inputs+2400outputs (0major+69728minor)pagefaults 0swaps
8.21user 17.23system 1:57.94elapsed 21%CPU (0avgtext+0avgdata 38196maxresident)k 0inputs+2432outputs (0major+74623minor)pagefaults 0swaps
Which is not very readable to me but I guess the important part is the max memory consumption.
Running the elastic-tube-3d
(which uses nearest-projection mapping) for 5 coupling time windows with the latest develop (CalculiX 2.19):
38.06user 0.87system 4:48.42elapsed 13%CPU (0avgtext+0avgdata 165308maxresident)k 1904inputs+4832outputs (26major+28902minor)pagefaults 0swaps
211.51user 2.55system 4:30.29elapsed 79%CPU (0avgtext+0avgdata 1146880maxresident)k 664inputs+46080outputs (9major+307780minor)pagefaults 0swaps
Running with this PR (CalculiX 2.16):
35.88user 0.51system 4:09.29elapsed 14%CPU (0avgtext+0avgdata 164520maxresident)k 960inputs+4824outputs (29major+28731minor)pagefaults 0swaps
197.59user 1.92system 4:05.77elapsed 81%CPU (0avgtext+0avgdata 1146016maxresident)k 0inputs+45872outputs (2major+307182minor)pagefaults 0swaps
No change, but the PR still seems to make sense.
Running this PR for shorter time (3 time windows), gives:
24.60user 0.43system 2:46.17elapsed 15%CPU (0avgtext+0avgdata 165056maxresident)k 0inputs+4768outputs (2major+28765minor)pagefaults 0swaps
130.11user 1.73system 2:41.99elapsed 81%CPU (0avgtext+0avgdata 1146632maxresident)k 0inputs+37912outputs (2major+306962minor)pagefaults 0swaps
Which is the same as for 5 time windows with PR, and the same as for 5 time windows with the original code.
All tests are with GCC 11.2.
Similar tests on my side with 2.19 VS 2.19 + this PR. Strangely I got only about 160MB with both versions, unlike Makis 🙈
These leaks are triggered in very specific cases only. This PR simplifies the code though, which is already a good reason to merge it.
One difference between my observations and the ones of @boris-martin is that I am using preCICE built from source in Debug mode, while @boris-martin probably used the Debian package in release mode.
That's a good angle to investigate, but I actually used a from-source build yesterday. Might be worth to do a comparison to be sure.
Something we have not yet tried is comparing to a much older state (we compared to develop, not to master). I can try that, again with elastic-tube-3d and with preCICE in release mode. If not, let's investigate in the workshop together with users. I will write on Discourse regarding this.
The one-time memory leak only affects the extraction of connectivity information from a tetrahedral interface. You can use a debugger and break on this function:
void PreciceInterface_ConfigureTetraFaces( PreciceInterface * interface, SimulationData * sim )
The recurring memory leak only affects simulations that define HEAT_TRANSFER_COEFF
before SINK_TEMPERATURE
in the configuration.
// This doesn't leak
Sink-Temperature-0
Heat-Transfer-Coefficient-0
// This leaks
Heat-Transfer-Coefficient-0
Sink-Temperature-0
I repeated my checks, comparing the preCICE distributions v2202 and v2104. I did not observe any significant difference in the maximum memory usage for the elastic-tube-3d and the heat-exchanger. Something that did improve, though, is that the order of defining Sink-Temperature
and Heat-Transfer-Coefficient
before did matter: defining Heat-Transfer-Coefficient
first was immediately leading to a segmentation fault. This does not seem to be the case anymore.
Here is my data: memory-leak.md
I did not include a case with nearest projection + CHT in my measurements.
This PR attempts to fix 2 memory leaks.
numElements
x doubles each timestep.Handing over to @boris-martin for the testing.