New Reduce - Githubissues

I know we've rewritten "reduce" so many times now, but this idea is cool.

Proposing to extend the symmetric address reduce (1181e16571f) with something that's integrated into parallel loops.

auto sum = symmetric_global_alloc<long>();

auto total = on_all_cores( reduce<add>(sum), [](long* sum) {
  *sum += foo();
});

// or same could be done with `forall`:
auto array = global_alloc<long>(N);
auto total = forall(array, N, reduce<add>(sum), [](long& v, long* sum) {
  *sum += foo(v);
});

Another thing to consider is to make the reduction part of the loop's sync object. So if we supported arbitrary GCE's for loops, you could just make a Reduce sync object and then have the 'return' from the loop lambda be the thing to reduce:

auto sinc = GlobalCompletionEvent<Reduce<add>>::create();
auto total = forall(array, N, sinc, [](long& v){
  return foo(v);
});
LOG(INFO) << "total = " << sinc.get();

This could be done as part of #119.

uwsampa / grappa

New Reduce #125