Unrolling the small loops in advPosHalf and calcEnergy leads to an improved cycles per instructions retired rate in these loops and hence slightly faster compute times.
On an i9-12900, looking using test/sedovbig/sedovbig.pnt I'm seeing improvements in the hydro cycle run time of:
Unrolling the small loops in advPosHalf and calcEnergy leads to an improved cycles per instructions retired rate in these loops and hence slightly faster compute times.
On an i9-12900, looking using test/sedovbig/sedovbig.pnt I'm seeing improvements in the hydro cycle run time of:
2 threads: 0.2% 4 threads: 0.5% 8 threads: 1.6% 16 threads: 2.1%