uwsampa / grappa

Grappa: scaling irregular applications on commodity clusters
grappa.io
BSD 3-Clause "New" or "Revised" License
160 stars 50 forks source link

New Reduce #125

Open bholt opened 11 years ago

bholt commented 11 years ago

I know we've rewritten "reduce" so many times now, but this idea is cool.

Proposing to extend the symmetric address reduce (1181e16571f) with something that's integrated into parallel loops.

auto sum = symmetric_global_alloc<long>();

auto total = on_all_cores( reduce<add>(sum), [](long* sum) {
  *sum += foo();
});

// or same could be done with `forall`:
auto array = global_alloc<long>(N);
auto total = forall(array, N, reduce<add>(sum), [](long& v, long* sum) {
  *sum += foo(v);
});

Another thing to consider is to make the reduction part of the loop's sync object. So if we supported arbitrary GCE's for loops, you could just make a Reduce sync object and then have the 'return' from the loop lambda be the thing to reduce:

auto sinc = GlobalCompletionEvent<Reduce<add>>::create();
auto total = forall(array, N, sinc, [](long& v){
  return foo(v);
});
LOG(INFO) << "total = " << sinc.get();

This could be done as part of #119.

bholt commented 10 years ago

Another thought is to go the other way and associate reduction objects with the GCE.

auto total = Reducer<int,add,&joiner>::create();
forall<&joiner>(array, N, [total](int& e){
  total += e;
}); // joiner.wait forces sync to be called on reducer?
VLOG(0) << (int)total; // coercion to get value could also force a sync