vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
1.16k stars 138 forks source link

Fix test for Issue 336 #564

Closed jodavies closed 1 month ago

jodavies commented 1 month ago

The test did not create both num,den orderings, as the commentary claimed.

Make cleaner on 32bit systems.

jodavies commented 1 month ago

@tueda are you able to reproduce the valgrind problem?

tueda commented 1 month ago

Yes. On this branch, somehow Issue336_1 and Issue336_2 fail randomly for parvorm build with MPICH on Ubuntu 20.04. I can reproduce it with Docker.

==13851== 1,128 bytes in 1 blocks are definitely lost in loss record 136 of 173
==13851==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==13851==    by 0x4B28A14: ??? (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851==    by 0x4B29CC5: ??? (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851==    by 0x4B27B5F: ??? (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851==    by 0x4B20400: ??? (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851==    by 0x4B0EFEC: ??? (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851==    by 0x4A0B783: PMPI_Probe (in /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.8)
==13851==    by 0x25EDC8: PF_Probe (mpi.c:234)
==13851==    by 0x2671B2: PF_ProbeWithCatchingErrorMessages (parallel.c:4584)
==13851==    by 0x267550: PF_WaitAllSlaves (parallel.c:1408)
==13851==    by 0x26792E: PF_EndSort (parallel.c:912)
==13851==    by 0x228816: EndSort (sort.c:710)

But this seems like a false positive because MPI_Probe should not allocate any resources that require management or freeing, at least at the user level.

For now we can do#pend_if valgrind? && mpi? while leaving some comments.

jodavies commented 1 month ago

Today I can reproduce it in a 20.04 VM, but yesterday the error was different, and also I had problems with the master branch. Some of these valgrind tests are a bit fickle...

tueda commented 1 month ago

I got 57 failures out of 100 attempts for Issue336_1 and 70 out of 100 for Issue336_2. For the master branch, I didn't get any failures (0 out of 20 attempts).

Some of these valgrind tests are a bit fickle...

Yes, but this kind of randomness is part of parallel computing.

coveralls commented 1 month ago

Coverage Status

coverage: 49.98% (+0.01%) from 49.968% when pulling 3e19750d62b8ff2ef7cc8e0472d43e7507320861 on jodavies:fix-test-336 into 4fc8e4047a2678c3d0d264f29866491107f0a63c on vermaseren:master.