Open jtramm opened 9 months ago
Another idea proposed by @gridley is to experiment with SOA vs. AOS for the main source region/source element data structures. While this would be likely to slow down the iteration update functions, it may help improve cache efficiency for the flux attenuation kernel.
Performance of the random ray solver in OpenMC is highly sensitive to the performance of the flux attenuation kernel that forms the inner loop of the simulation. This inner loop is responsible for performing the attenuation (and source accumulation + attenuation) of the angular flux for a ray crossing a single flat source region, for all energy groups. The inner loop is formed in a SIMD fashion over all energy groups, allowing for potential vectorization performance gains on most architectures (particularly important when energy group count is high, less important for e.g., 7 group problems). Some compiler hints (e.g.,
#pragma omp simd
), loop re-organization, manual memory alignment intrinsics, and/or inlining of the exponential evaluation may be required.There are other potential optimizations in this kernel as well that @gridley has proposed, e.g., templating the function to reduce branching, and movement of the locking operation.