Open tommasobo opened 3 years ago
Here's a high level of each of the algorithms:
min - Each packet takes a minimal path. Where there are multiple minimal paths, the actual path taken is based on a hash (off the top of my head, I think it's based off of src and dest node).
min_a - Each packet takes a minimal path, but the path taken is adaptive based on load
Valiant - Packets that route between groups route first to a random intermediate group (not router within that group). Packets that route only within a group route to a random router in that group before routing to dest.
Adaptive_local - this is essentially a UGAL-like algorithm, but uses buffer credits instead of buffer occupancy to make decisions about adaptive routing (which is why we didn't call it UGAL). This was really just an experiment and the UGAL algorithm is likely the one you actually want. This algorithm also only looks at 2 minimal and 2 adaptive routes, chosen using a hash. It does not look at all possible routes.
UGAL - Typical ugal algorithm that choses between routing first to a random intermediate group (not router within that group) and routing minimally at the source router (this does not use any non-local information). This uses estimated buffer occupancy to make routing decisions. It will consider all possible ports to the intermediate and destination group when making decisions. It would be possible to add router to intermediate router, but it does not currently.
Hope that helps.
@feldergast
Thank you for your answer.
Just one quick follow-up: does UGAL use only local information? So is it basically a UGAL-L type of algorithm? Because initially I thought it was more like UGAL-G but now I am not so sure of that rechecking the code.
Also you said that for adaptive_local you use a hash to determine the two routes but where does that actually happen in the code? I can't seem to find it anywhere.
@tommasobo You are correct, it is essentially UGAL-L. I'm not sure we could do a UGAL-G oracle-type implementation on SST, since it was designed from the ground up to not let you cheat (i.e. send data instantly anyplace you'd like). I have been thinking about algorithms that would exchange information between routers in a group to get a better view of the load on global links, but I have not yet had time to flesh out the idea and implement it.
We are currently using SST to run some benchmarks mostly using Dragonfly networks.
We are using all the various routing algorithms (MIN, UGAL, adaptive_local, MIN_A, valiant) and we are having some troubles understanding what was used as reference to implement each algorithm.
This is our current understanding of each one, please correct me if we made some mistake or even better would be to get a source/reference for what you used.
MIN -> Should be pretty straightforward minimal algorithm where the same minimal path is used.
Valiant -> I expected this to choose a random intermediate switch that is not in the source and destination groups and then MIN for the other paths. But in the code it looks like you also choose a random switch within the same group (line 863 of dragonfly.cc https://github.com/sstsimulator/sst-elements/blob/e3610a13eac8644903cc56c69552471cf158ac86/src/sst/elements/merlin/topology/dragonfly.cc#L863 )
UGAL -> We choose what algo to use based on the occupancy of packet queues of the network. But here do you use just the information from the source switch or do you send packets to gather that information from all (or some) of the switch in the network?
Adaptive Local -> I expect this to be the same as UGAL but with only local info. Is that correct?
MIN_A -> Not really sure about this one but it would be great to get some info about it :D
I tried digging into the code to get an answer and while it helped it would be great to get a certain answer from the development team. I also think it would be helpful to document this somewhere in the code or on the SST website.
Thank you!