Closed thomasrolinger closed 2 years ago
We need to see what is happening on swan with the atomic operations.
We're going with 128 buffer size for swan for the graph apps; it seems to be good enough.
The issue on Swan with atomics is that it supports network atomics. Right now, Chapel does not map their atomics to network atomics for Infiniband/GASNet.
What happens is that there is added overhead of network atomics when performed on a local node (they have to go to the NIC still for coherence reasons). If we instead use processor atomics (which do not currently have a user-facing interface; you have to use chpl__processorAtomicType
), then we can see speed-ups.
The issue with that is an automatic optimization can't expect to just change the type of an array, especially if remote network atomics are needed elsewhere. If you use processor atomics in that case, you'll see just as bad slow-downs as before but in a different part of the code.
I think the easiest way going forward is to not use aggregation for atomics if CHPL_COMM=ugni
. We'll assume that this is something known at compile time (it is, but could users toggle between comms when they run the program?).
This commit adds the check when we create the aggregator: https://github.com/thomasrolinger/chapel/commit/709205d08203d82478d6ee6caef35818c638805f
We need to run our aggregation experiments on swan and see what buffer size we should use. I believe we are using 128 on sherlock now. I suspect that to be different on swan.