pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
562 stars 279 forks source link

bug: crashes with nonblocking collectives and isend/irecv #2355

Open mpichbot opened 8 years ago

mpichbot commented 8 years ago

Originally by robl on 2016-10-04 16:34:12 -0500


The code HXHIM (formerly known as MDHIM) sometimes (at our urging) tries to use MPI to communicate between entities. It does not go well.

That is, we implemented a simple MDHIM rpc loop in MPI and MARGO in a child thread and in main thread tested a bunch of MPI calls. We ensured that we found spots where the MPI child thread interfered with the main thread. And then we re-implemented the RPC stuff with MARGO [an HPC-oritented RPC framwork based on Mercury and Argobots] and made sure that worked. > It did!

in MPICH the implementation crashes on any collective combined with MPI_isend/irecv.

mpichbot commented 8 years ago

Originally by robl on 2016-10-04 16:38:02 -0500


Attachment added: margo_mpi_test[1].tgz (8.0 KiB) test case for RPC-oriented workload

roblatham00 commented 6 years ago

The attached test case targets an older version of margo/mercury. I'll have to update it to our latest API