Closed mhaseeb123 closed 1 year ago
Hi, I couldn't reproduce the issue on my side. All tests worked fine on my machines. Which machine and MPI implementation are you using?
One suggestion, instead of setting LD_PRELOAD
globally, can you try setting it only for the srun command? i.e., srun -n 2 --export=ALL,LD_PRELOAD=/global/cfs/cdirs/m1759/mhaseeb/pilgrim/gnu/install/lib/libpilgrim.so ...?
Also, there's no need to set PILGRIM_TRACING
if you are using the default tracing mode. This shouldn't cause the error though.
Thank you for getting back.
I am running on the Perlmutter machine and am using cray-mpich/8.1.25
to compile. Let me try your suggestion on using LD_PRELOAD
within the srun
command. We do have a couple other MPI modules that I can try to compile the apps with and see.
mpich should be fine, we tested mostly with mpich. I do have access to Perlmutter, I will test on it too.
Thank you @wangvsa. Appreciate it!
I found the issue. It was caused by those global variables, I was wrong before, we can not just add static to them. I have fixed the issue and also did some code cleaning to get rid of some compile warnings. Here's the PR: https://github.com/pmodels/pilgrim/pull/37
Pilgrim does seem to be working after locally merging the PR #37 on Perlmutter. Thanks again for your prompt help.
Hi,
I am encountering an abort at MPI process
0
when I setPILGRIM_TIMING_MODE
to anything butAGGREGATED
orZSTD
formpirun/srun -np >=2
. Here is a log from running thesendall
test.For
HIST and CFG
aNo Errors
does show up but then a segfault follows like this.Here is my
configure
command used when buildingpilgrim
.I would appreciate any help with this. Thank you!