Run aggregation experiments on swan

thomasrolinger commented 2 years ago

We need to run our aggregation experiments on swan and see what buffer size we should use. I believe we are using 128 on sherlock now. I suspect that to be different on swan.

thomasrolinger commented 2 years ago

For BFS, Kcore a buffer size of 128 or 256 is good.
For SSSP, 128 is good, 64 is better.
For Graph Construction, the atomic phase is slower (see below). The assignment phase seems best at 256.
For Histogram, I can't get aggregation to be faster than the baseline; something weird with atomics?
For Transpose, I see the same thing as above; the portion were we aggregate atomics is getting worst but the loop that is aggregating assignments is getting better.

We need to see what is happening on swan with the atomic operations.

thomasrolinger commented 2 years ago

We're going with 128 buffer size for swan for the graph apps; it seems to be good enough.

The issue on Swan with atomics is that it supports network atomics. Right now, Chapel does not map their atomics to network atomics for Infiniband/GASNet.

What happens is that there is added overhead of network atomics when performed on a local node (they have to go to the NIC still for coherence reasons). If we instead use processor atomics (which do not currently have a user-facing interface; you have to use chpl__processorAtomicType), then we can see speed-ups.

The issue with that is an automatic optimization can't expect to just change the type of an array, especially if remote network atomics are needed elsewhere. If you use processor atomics in that case, you'll see just as bad slow-downs as before but in a different part of the code.

I think the easiest way going forward is to not use aggregation for atomics if CHPL_COMM=ugni. We'll assume that this is something known at compile time (it is, but could users toggle between comms when they run the program?).

This commit adds the check when we create the aggregator: https://github.com/thomasrolinger/chapel/commit/709205d08203d82478d6ee6caef35818c638805f

thomasrolinger / chapel

Run aggregation experiments on swan #38