Closed jodavies closed 1 month ago
@tueda are you able to reproduce the valgrind problem?
Yes. On this branch, somehow Issue336_1
and Issue336_2
fail randomly for parvorm
build with MPICH on Ubuntu 20.04. I can reproduce it with Docker.
==13851== 1,128 bytes in 1 blocks are definitely lost in loss record 136 of 173
==13851== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==13851== by 0x4B28A14: ??? (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851== by 0x4B29CC5: ??? (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851== by 0x4B27B5F: ??? (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851== by 0x4B20400: ??? (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851== by 0x4B0EFEC: ??? (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851== by 0x4A0B783: PMPI_Probe (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851== by 0x25EDC8: PF_Probe (mpi.c:234)
==13851== by 0x2671B2: PF_ProbeWithCatchingErrorMessages (parallel.c:4584)
==13851== by 0x267550: PF_WaitAllSlaves (parallel.c:1408)
==13851== by 0x26792E: PF_EndSort (parallel.c:912)
==13851== by 0x228816: EndSort (sort.c:710)
But this seems like a false positive because MPI_Probe
should not allocate any resources that require management or freeing, at least at the user level.
For now we can do#pend_if valgrind? && mpi?
while leaving some comments.
Today I can reproduce it in a 20.04 VM, but yesterday the error was different, and also I had problems with the master branch. Some of these valgrind tests are a bit fickle...
I got 57 failures out of 100 attempts for Issue336_1
and 70 out of 100 for Issue336_2
. For the master branch, I didn't get any failures (0 out of 20 attempts).
Some of these valgrind tests are a bit fickle...
Yes, but this kind of randomness is part of parallel computing.
The test did not create both num,den orderings, as the commentary claimed.
Make cleaner on 32bit systems.