parallel-runtimes / lomp

Little OpenMP Library
Apache License 2.0
153 stars 17 forks source link

Support tree reductions #1

Open JimCownie opened 3 years ago

JimCownie commented 3 years ago

The runtime currently supports atomic reductions and those which use a critical section, but each of those is linear since every thread is contending for access to the same reduction target buffer. The compiler already generates code which should allow the implementation of reduction up a tree at a tree barrier. That allows reduction operations to be happening concurrently in separate sub-trees, and should, therefore, have better performance for large reductions.

We should add code to support this.

The main complexity here is likely understanding the compiler interface! This will probably also need some small changes in the barrier code implementation(s), since the reduction needs to happen in non-leaf threads at the point where they see that a child thread has checked in, but before they pass the "we're all here" message up the tree. (Ideally, as each thread arrives its contribution can be accumulated).