sstsimulator / sst-elements

SST Architectural Simulation Components and Libraries
http://www.sst-simulator.org
Other
89 stars 116 forks source link

Documentation unclear for routing algorithms used with Dragonfly topology #1697

Open tommasobo opened 3 years ago

tommasobo commented 3 years ago

We are currently using SST to run some benchmarks mostly using Dragonfly networks.

We are using all the various routing algorithms (MIN, UGAL, adaptive_local, MIN_A, valiant) and we are having some troubles understanding what was used as reference to implement each algorithm.

This is our current understanding of each one, please correct me if we made some mistake or even better would be to get a source/reference for what you used.

I tried digging into the code to get an answer and while it helped it would be great to get a certain answer from the development team. I also think it would be helpful to document this somewhere in the code or on the SST website.

Thank you!

feldergast commented 3 years ago

Here's a high level of each of the algorithms:

min - Each packet takes a minimal path. Where there are multiple minimal paths, the actual path taken is based on a hash (off the top of my head, I think it's based off of src and dest node).

min_a - Each packet takes a minimal path, but the path taken is adaptive based on load

Valiant - Packets that route between groups route first to a random intermediate group (not router within that group). Packets that route only within a group route to a random router in that group before routing to dest.

Adaptive_local - this is essentially a UGAL-like algorithm, but uses buffer credits instead of buffer occupancy to make decisions about adaptive routing (which is why we didn't call it UGAL). This was really just an experiment and the UGAL algorithm is likely the one you actually want. This algorithm also only looks at 2 minimal and 2 adaptive routes, chosen using a hash. It does not look at all possible routes.

UGAL - Typical ugal algorithm that choses between routing first to a random intermediate group (not router within that group) and routing minimally at the source router (this does not use any non-local information). This uses estimated buffer occupancy to make routing decisions. It will consider all possible ports to the intermediate and destination group when making decisions. It would be possible to add router to intermediate router, but it does not currently.

Hope that helps.

tommasobo commented 2 years ago

@feldergast

Thank you for your answer.

Just one quick follow-up: does UGAL use only local information? So is it basically a UGAL-L type of algorithm? Because initially I thought it was more like UGAL-G but now I am not so sure of that rechecking the code.

Also you said that for adaptive_local you use a hash to determine the two routes but where does that actually happen in the code? I can't seem to find it anywhere.

feldergast commented 2 years ago

@tommasobo You are correct, it is essentially UGAL-L. I'm not sure we could do a UGAL-G oracle-type implementation on SST, since it was designed from the ground up to not let you cheat (i.e. send data instantly anyplace you'd like). I have been thinking about algorithms that would exchange information between routers in a group to get a better view of the load on global links, but I have not yet had time to flesh out the idea and implement it.