lanl / PENNANT

Unstructured mesh hydrodynamics for advanced architectures
Other
20 stars 30 forks source link

Add gcc unroll optimizations in advPosHalf and calcEnergy #13

Open ColinIanKing opened 1 year ago

ColinIanKing commented 1 year ago

Unrolling the small loops in advPosHalf and calcEnergy leads to an improved cycles per instructions retired rate in these loops and hence slightly faster compute times.

On an i9-12900, looking using test/sedovbig/sedovbig.pnt I'm seeing improvements in the hydro cycle run time of:

2 threads: 0.2% 4 threads: 0.5% 8 threads: 1.6% 16 threads: 2.1%