pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
535 stars 280 forks source link

External algorithm registration for collective selection #1825

Closed mpichbot closed 7 years ago

mpichbot commented 7 years ago

Originally by blocksom on 2013-04-30 13:37:43 -0500


This ticket is to continue the discussion, and record the resolution, of a commit recently contributed by IBM that was not accepted into the master branch of the mpich.org/mpich.git git repository.

Why was this commit was not accepted, and what could be changed that would make it acceptable?

Note that this commit is quite large and most of the changes are in the pamid adi.

mpichbot commented 7 years ago

Originally by blocksom on 2013-04-30 13:38:38 -0500


See branch ticket-1825 in the mpich-ibm.git repository.

mpichbot commented 7 years ago

Originally by balaji on 2013-04-30 22:03:41 -0500


What exactly is this patch trying to do? Allow external users to register a different algorithm? Or is it just so other mpich-internal files can access the algorithm?

mpichbot commented 7 years ago

Originally by blocksom on 2013-05-01 09:09:04 -0500


I added Sameh Shakawi to this ticket - he was the original commit author.

This is a (quite large) new feature for pamid that allows enables collective profiling/selection based on a new pami library extension.

Sameh, could you give a high-level overview of this work, and describe why the changes to common mpich code was necessary?

See also ticket #1825.

mpichbot commented 7 years ago

Originally by sssharka on 2013-05-03 11:42:02 -0500


I will try to explain briefly what this patch is doing. But first I need to explain the design of our collective selection mechanism.

Collective Selection: 1 - A user or sys admin will need to run a tool (pami_tune) that we provide which basically benchmarks all pami collectives. Basically benchmark different collectives for different geometry sizes and different message sizes (somehow similar to IMB except that we do this for every collective algorithm).

2 - Once pami_tune is done, a user turns on a switch using an env variable to enable automatic collective selection in pami. This essentially tells pami choose the best algorithm for a specific collective for some message size and geometry size. So pami goes and checks the output from pami_tune and chooses the best algorithm for this collective call.

The patch: In some cases (certain geometry sizes and certain message sizes) MPICH2 collectives are faster than pami collectives. So we designed a mechanism where pami_tune can call MPICH2 collectives during benchmarking phase and also during runtime if pami decided that MPICH2 is faster than pami.

In addition, pami_tune since it changes geometry sizes when benchmarking, we needed pami_tune to be able to ask MPICH2 to create new communicators. Hence, we register function pointers with pami which pami can use to call back into MPICH2 either to run a collective or to create a communicator.

Thanks Sameh

mpichbot commented 7 years ago

Originally by balaji on 2013-05-03 13:17:16 -0500


Thanks. Why not use MPIR_Comm_create_intra/inter directly? What's the benefit of adding the additional MPIR_Comm_create_intra_ext (which seems to be just called MPIR_Comm_create_intra anyway)?

mpichbot commented 7 years ago

Originally by sssharka on 2013-05-06 09:48:40 -0500


The reason I created the MPIR_Comm_create_intra_ext is that MPIR_Comm_create_intra is not externed so I can't use directly. Instead of externing them I added the new extern function.

Thanks Sameh

mpichbot commented 7 years ago

Originally by balaji on 2013-05-06 10:15:10 -0500


Not sure what you mean here. This is similar to MPIR_Bcast_intra, which you already use in pamid. These functions are all declared in mpiimpl.h, which is included in the device.

mpichbot commented 7 years ago

Originally by sssharka on 2013-05-06 10:35:43 -0500


Actually MPIR_Comm_create_intra/inter are not declared in mpiimpl.h. This is why I created the _ext one and added it to mpiimpl.h

Thanks Sameh

mpichbot commented 7 years ago

Originally by balaji on 2013-05-06 10:39:30 -0500


The intra one is:

http://git.mpich.org/mpich.git/blob/HEAD:/src/include/mpiimpl.h#l4092

The inter one is not, but we could add it if needed.

mpichbot commented 7 years ago

Originally by sssharka on 2013-05-06 10:44:30 -0500


That is great.. This is what I needed. I guess in our repo we didn't have the MPIR_Comm_create_intra in the mpiimpl.h. Probably we will need the MPIR_Comm_create_inter as well for dynamic tasking.

Thanks Sameh

mpichbot commented 7 years ago

Originally by balaji on 2013-05-06 11:25:23 -0500


Feel free to add MPIR_Comm_create_inter to mpiimpl.h. Will you be providing an alternate patch?

mpichbot commented 7 years ago

Originally by sssharka on 2013-05-06 13:53:55 -0500


I just created an alternate patch. Mike should be pushing it shortly.

Thanks Sameh

mpichbot commented 7 years ago

Originally by blocksom on 2013-05-07 10:18:16 -0500


I took the fix for ticket #1839, rebased Sameh's "alternate patch" on to it, and pushed everything to the branch mpich-ibm/ticket-1839-alt.

Included on this branch is the commit from #1827 that depends on this fix. I think #1827 can be closed as a duplicate of this ticket.

I also included several other "collective selection" commits that were in the pipeline in our internal git repository. This way the mpich-ibm/ticket-1825-alt branch contains the entire collective selection code.

mpichbot commented 7 years ago

Originally by balaji on 2013-05-07 13:44:34 -0500


All commits in the mpich-ibm/ticket-1825-alt branch have been pushed to mpich/master.

mpichbot commented 7 years ago

Originally by blocksom on 2013-05-07 13:51:29 -0500


I deleted these branches from the mpich-ibm git repository:

mpichbot commented 7 years ago

Originally by balaji on 2013-05-07 13:59:35 -0500


Relevant commit: [f4f0de65].