thomasrolinger / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
0 stars 0 forks source link

Run aggregation experiments on swan #38

Closed thomasrolinger closed 2 years ago

thomasrolinger commented 2 years ago

We need to run our aggregation experiments on swan and see what buffer size we should use. I believe we are using 128 on sherlock now. I suspect that to be different on swan.

thomasrolinger commented 2 years ago

We need to see what is happening on swan with the atomic operations.

thomasrolinger commented 2 years ago

We're going with 128 buffer size for swan for the graph apps; it seems to be good enough.

The issue on Swan with atomics is that it supports network atomics. Right now, Chapel does not map their atomics to network atomics for Infiniband/GASNet.

What happens is that there is added overhead of network atomics when performed on a local node (they have to go to the NIC still for coherence reasons). If we instead use processor atomics (which do not currently have a user-facing interface; you have to use chpl__processorAtomicType), then we can see speed-ups.

The issue with that is an automatic optimization can't expect to just change the type of an array, especially if remote network atomics are needed elsewhere. If you use processor atomics in that case, you'll see just as bad slow-downs as before but in a different part of the code.

I think the easiest way going forward is to not use aggregation for atomics if CHPL_COMM=ugni. We'll assume that this is something known at compile time (it is, but could users toggle between comms when they run the program?).

This commit adds the check when we create the aggregator: https://github.com/thomasrolinger/chapel/commit/709205d08203d82478d6ee6caef35818c638805f