sstsimulator / sst-macro

SST Macro Element Library
http://sst-simulator.org/
Other
34 stars 41 forks source link

Algorithm implementation for MPI_Reduce and MPI_Allreduce? #625

Closed afranques closed 2 years ago

afranques commented 3 years ago

Dear All,

I have been studying the SST-Macro implementation of the MPI_Reduce (reduce.cc) and MPI_Allreduce (allreduce.cc) for a bit (by looking at the code and using default and custom debug flags) and I'm still unsure what kind of algorithm they consist of. Some questions and thoughts:

  1. I got a hint from the code that seems to imply a gather is taking place. Also, the name of the class is WilkeHalvingReduce, which I assume implies there's also a halving process involved. Could it be that it's some sort of reduce-scatter and then gather?
  2. I added some printfs after each SendAction and ReceiveAction and I saw that there is indeed a halving process in which the same rank sends half the number of elements to a different rank at each iteration. However it gets to a point where ranks invoke SendAction and ReceiveAction with 0 elements (nelems=0). Why does it send a message with no elements in it?
  3. In the code there's also a notion of mapping "real" to "virtual" ranks (stored in VirtualRankMap), why do we need such mapping?
  4. I was not able to find the place in the code where the reduction operation is applied; the reduce code takes argument reduce_fxn fxn but I cannot see it being used afterwards.

Thanks to anyone who may shed some light into this. -Antonio

jpkenny commented 3 years ago

@jjwilke can you provide any background?

JiquanLi commented 3 years ago

Hi, I got same question as above about the implementation of MPI_Reduce, is there any guide or document?

@jjwilke can you provide any background?

jpkenny commented 3 years ago

There isn't anyone on the project at this point that has direct knowledge of the algorithm implementations and I don't believe there is any separate documentation.

I studied up on the code a bit and it sure looks like it's based on recursive halving as described in https://www.mcs.anl.gov/~thakur/papers/ijhpca-coll.pdf I suspect that paper was used to guide the implementation and it should be pretty useful in understanding the algorithm.