ianhinder / Kranc

A Mathematica package for generating code for solving time dependent partial differential equations
http://kranccode.org
GNU General Public License v2.0
29 stars 10 forks source link

Support DG operators efficiently #128

Open eschnett opened 9 years ago

eschnett commented 9 years ago

Kranc currently supports DG operators, but not efficiently. I am planning to add a feature that makes this efficient. The basic plan is to loop over a grid function in two layers: an outermost layer that loops over elements (or "tiles" in general), and an innermost layer that loops over all collocation points within an element.

OpenMP parallelization is applied only to the outermost layer. Vectorization is only relevant for the innermost layer. Derivatives etc. are applied "en bloc" to a whole element.

This should also generalize to other numerical methods such as finite differencing. Replace the term "element" by "tile", and "collocation point" by "grid point" for this. The tile size can be chosen freely and is not restricted to the element size as for DG. This optimization has proven beneficial in Chemora for FD on GPUs, so I assume this would also lead to a performance benefit on CPUs.

I attach a sample for how the generated code (sans Kranc-typical boilerplate) could look like, following the structure described above.

eschnett commented 9 years ago

See https://gist.github.com/eschnett/c79ec34af0b8f80c71db.

ianhinder commented 9 years ago

Additionally, I recently tested that using tiling improved performance significantly on Intel MICs, possibly due to giving a larger number of small work units to distribute using OpenMP. Tiling is currently implemented in LoopControl. Why is it better to do tiling in Kranc than in LoopControl or cctk_Loop.h?

eschnett commented 9 years ago

I want to add other features such as pre-calculating derivatives per tile instead of calculating them per grid point or per grid function. This is not easily possible otherwise. See the attached gist, which shows the structure of the generated code I want to have.