Open nikhilnanal opened 11 months ago
What is the configuration? Single node or multinode?
its a single node configuration (mpiexec -n 4 ./idup_nb).
What about the MPICH build configuration?
Mpich build configuration: ./configure --prefix="path to mpich installation" --with-libfabric="path to libfabric build" --disable-fortran --with-device=ch4:ofi --without-ze
https://github.com/pmodels/mpich/issues/3794 which seems like a similar issue mentions to set --enable-posix-mutex=ticketlock. is this a mpich configure option? I dont see this in the configure help.
any suggestions on this issue?
sometimes I get the above mentioned assertion, while some other times i get this error Abort(1) on node 1: In MPIR_Free_contextid, the context id is not in use (Internal MPI error!)
Sorry for the lack of update. What you are seeing looks to be a bug in the thread-safe idup implementation. We will look into it.
Hi, is there any update on this issue?
Hi, is there any update on this issue?
Sorry, no update yet.
--enable-posix-mutex=ticketlock. is this a mpich configure option? I dont see this in the configure help.
This option gets passed to an internal convenience library that provides thread safety features for MPICH. You can try adding it to your build configuration and see if it makes a difference.
I am seeing the following errors while running the idup_nb test occassionally (may be like 1 in 5 times)
Assertion failed in file src/mpi/comm/contextid.c at line 239: mask[idx] & (1U << bitpos) ../middlewares/mpich_mpichtest/lib/libmpi.so.12(+0x541a76) [0x7f956021da76] ../middlewares/mpich_mpichtest/lib/libmpi.so.12(+0x44f884) [0x7f956012b884] ../middlewares/mpich_mpichtest/lib/libmpi.so.12(+0x3c98f0) [0x7f95600a58f0] ../middlewares/mpich_mpichtest/lib/libmpi.so.12(+0x3c9a04) [0x7f95600a5a04] ../middlewares/mpich_mpichtest/lib/libmpi.so.12(+0x52da8c) [0x7f9560209a8c] ../middlewares/mpich_mpichtest/lib/libmpi.so.12(+0x530dee) [0x7f956020cdee] ../middlewares/mpich_mpichtest/lib/libmpi.so.12(+0x3cb747) [0x7f95600a7747] ../middlewares/mpich_mpichtest/lib/libmpi.so.12(+0x3b4b02) [0x7f9560090b02] ../mpich_mpichtest/lib/libmpi.so.12(MPI_Comm_idup+0x212) [0x7f955fd7f822] ./idup_nb() [0x40264b] /lib64/libpthread.so.0(+0x81cf) [0x7f95628071cf] /lib64/libc.so.6(clone+0x43) [0x7f955f94fe73] Abort(1) on node 1: Internal error