pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
521 stars 277 forks source link

mpi_t: Callback in events #6594

Open aruhela opened 1 year ago

aruhela commented 1 year ago

Hi MPICH Team,

In the attached example, I am unable to get the events callback function getting invoked. Its a simple program sending non-blocking P2P messages across two processes and tested with 4.1.2 MPICH.

The outputs looks like below.


++ mpiexec -np 2 ./configureEvent

Num of event sources = 1 Event Sources are : Source_Index : Name : Description : source_order ticks_per_second max_ticks

0 : RECVQ : active message receive queue : MPI_T_SOURCE_ORDERED 1000000000 9223372036854775807

Maximum length of Event Source Name and its description is 6 and 29 respectively.


Num of events = 2 Event_Index : Name : Description : Verbosity Bind

0 : unexp_message_enqueued : message added to unexpected queue : MPI_T_VERBOSITY_USER_BASIC MPI_T_BIND_NO_OBJECT
1 : unexp_message_dequeued : message removed from unexpected q : MPI_T_VERBOSITY_USER_BASIC MPI_T_BIND_NO_OBJECT

uenq_idx=0,udeq_idx=1 Maximum length of Event Name and its description is 23 and 34 respectively.

err1=0, err2=0, err3=0, err4=0 Sending P2P Non-blocking messages - inter process

In recvq_free_cb ... In recvq_free_cb ...


> 

Thanks
Amit Ruhela

[configureEvent.txt](https://github.com/pmodels/mpich/files/12071619/configureEvent.txt)
hzhou commented 1 year ago

This is due to lack of progress before posting the MPI_Irecv. The progress is polled after posting MPI_Irecv, at which time the arrived message matches a posted queue, thus no longer an "unexpected" message. Try enable a progress thread, e.g. --

export MPIR_CVAR_ASYNC_PROGRESS=1
mpiexec -n 2 ./configureEvent

You should see the event callback if you run on intra-node or enable "am-only" on internode.

raffenet commented 10 months ago

@aruhela just to confirm, is the message sent between processes on the same node? In a standard configuration, MPICH can only generate events for unexpected messages arriving via the shared memory transport. Network unexpected queues are unfortunately not visible to MPICH.

Alternatively to a progress thread, you could do an MPI_Barrier before posting the MPI_Irecv.