Open khatharsis42 opened 9 months ago
Hi @khatharsis42 can you try tracing different applications to see if you get the same error? And is it possible to share your code so I can debug?
I'm using Pilgrim to trace a few mini-apps, and I've seen that particular bug when tracing AMG and Lulesh (once I use enough MPI processes, no problem with 8 but the bug appears when using 27). Interestingly, I've had no issue with Kripke.
Thanks. I'll test AMG and get back to you.
Issue description
Whenever I try to run Pilgrim to trace a MPI program running on a local machine, I have no issue. However, once I try to run it on another machine, I get the following issue:
src/pilgrim_mpi_objects.c:172: create_request_id: Assertion 'entry == NULL' failed.
Steps to reproduce
I'm using mpich 4.0.2, and the latest version of Pilgrim. I have two nodes available,
localnode
andremotenode
mpirun -np N --host localnode,remotenode -LD_PRELOAD <path to libpilgrim.so> <my executable>
yields the aforementioned error as soon as N is greater than 1. If I remove the remote node, I can get N as big as I want it to be.Possible fix
The mentionned line is the following:
I've removed this assertion, and so far I've seen nothing weird happening. I have no idea as to whether that assertion is important.