Random Ray Inner Loop Optimization + Vectorization

Performance of the random ray solver in OpenMC is highly sensitive to the performance of the flux attenuation kernel that forms the inner loop of the simulation. This inner loop is responsible for performing the attenuation (and source accumulation + attenuation) of the angular flux for a ray crossing a single flat source region, for all energy groups. The inner loop is formed in a SIMD fashion over all energy groups, allowing for potential vectorization performance gains on most architectures (particularly important when energy group count is high, less important for e.g., 7 group problems). Some compiler hints (e.g., #pragma omp simd), loop re-organization, manual memory alignment intrinsics, and/or inlining of the exponential evaluation may be required.

There are other potential optimizations in this kernel as well that @gridley has proposed, e.g., templating the function to reduce branching, and movement of the locking operation.

openmc-dev / openmc

Random Ray Inner Loop Optimization + Vectorization #2844