Open aulwes opened 9 months ago
Some more information. I modified one of the tests, TestSimpleStackTracker, by parallelizing it using MPI_Init/MPI_Finalize. I then rebuilt Malt using mpicc/mpic++ and the Cray MPICH version 8.1.25. I tested this on one of our Cray clusters that uses Slurm resource manager. I ran with 'srun -n 1 /path/to/malt --mpi ./src/lib/tests/TestSimpleStackTracker'. I then get this error
TestSimpleStackTracker: /users/rta/workspace/malt/src/lib/common/SimpleAllocator.cpp:179: void MALT::SimpleAllocator::free(void *): Assertion `unusedMemory <= totalMemory' failed. /usr/projects/perfeng/utils/malt/ro/rta/bin/malt: line 458: 134989 Aborted (core dumped) LD_PRELOAD="${MPI_WRAPPER_DIR}/libmaltmpi.so:${MALT_LIB}:${LD_PRELOAD}" "$@"
Hello, thanks for reporting the issue.
I would ask two things to help debugging:
unusedMemory
and totalMemory
to see if one of them is 0 or totally wrong value ?For 1., I see
unusedMemory = 262328, totalMemory = 262144
Let me work on answering 2.
On Dec 21, 2023, at 12:16 PM, Sébastien Valat @.**@.>> wrote:
Hello, thanks for reporting the issue.
I would ask two things to help debugging:
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/memtt/malt/issues/86*issuecomment-1866811652__;Iw!!Bt8fGhp8LhKGRg!EJ2cP3vvUQN7QwKE1tWxC0H7vJs5K-oC-uKEi0slsODV_leC_y-RNhkfAnTMJXqGwAPzOzkdeamykdLvFznQsQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB46TOXLNHB7KIGNJI4QR33YKSDI5AVCNFSM6AAAAABA3T2WBCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWHAYTCNRVGI__;!!Bt8fGhp8LhKGRg!EJ2cP3vvUQN7QwKE1tWxC0H7vJs5K-oC-uKEi0slsODV_leC_y-RNhkfAnTMJXqGwAPzOzkdeamykdJuKSOoqQ$. You are receiving this because you authored the thread.Message ID: @.***>
Here is the backtrace I get from a core dump:
from /usr/projects/perfeng/utils/malt/ro/rta/lib64/libmalt.so
from /usr/projects/perfeng/utils/malt/ro/rta/lib64/libmalt.so
On Dec 21, 2023, at 12:16 PM, Sébastien Valat @.**@.>> wrote:
Hello, thanks for reporting the issue.
I would ask two things to help debugging:
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/memtt/malt/issues/86*issuecomment-1866811652__;Iw!!Bt8fGhp8LhKGRg!EJ2cP3vvUQN7QwKE1tWxC0H7vJs5K-oC-uKEi0slsODV_leC_y-RNhkfAnTMJXqGwAPzOzkdeamykdLvFznQsQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB46TOXLNHB7KIGNJI4QR33YKSDI5AVCNFSM6AAAAABA3T2WBCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWHAYTCNRVGI__;!!Bt8fGhp8LhKGRg!EJ2cP3vvUQN7QwKE1tWxC0H7vJs5K-oC-uKEi0slsODV_leC_y-RNhkfAnTMJXqGwAPzOzkdeamykdJuKSOoqQ$. You are receiving this because you authored the thread.Message ID: @.***>
Hi, I'm continuing to get this LD_PRELOAD segfault, but not on all apps that I run with malt. Is there anything else I can try?
I think I've found the problem. One of the apps we're profiling is built with Intel compilers. When I built malt with icx/icpx using Intel 2021 compilers, then I don't get the segfault. For the other apps, I used gcc 10.
Hi, sorry didn't has yet time to investigate.
But as you pointed, there could be a problem due to mix of C++ libraries (intel / gnu).
Have you tried to also compile MALT with icpc so everything is under intel (malt & the app) ?
Yes, I compiled a version using Intel icpx/icx and that worked. thank you!
Hum, thanks very much for the reporting, that's good to know.
I had the impression up to now that there was no issue in that case, but apparently yes.
I followed the instructions for fixing the LD_PRELOAD problem by building malt with -DLIBUNWIND_PREFIX=/path/to/libunwind. But I continue to get a segfault when running and it occurs after the app finishes running. If I try to add the malt runtime option '-s libunwind', I get the LD_PRELOAD segfault immediately. Is there something else I can try?