Closed manodeep closed 9 months ago
@lgarrison Didn't realise I hadn't requested a review -- oops!
Ohh forgot to mention that I ran the INTEGRATION_TESTS for this change and the exhaustive tests passed
Didn't actually do a line-by-line comparison timer. Will attempt to do that on my laptop; plus, I will also check that the runtime is not adversely affected on our local linux supercomputer (Skylake and AMD EPYC)
Timed the tests on the master
and this branch on Skylake cpus - essentially no difference for theory but the mocks, specifically DDtheta
is faster.
Timed the tests on master
vs this branch on EPYC cpus - same as above difference in theory but slightly faster with the simplified kernels (but smaller improvements compared to SKX). In general, both branches run faster on EPYC compared to SKX.
Totally forgot to merge this PR!
Commenting to add the link to the original #296 that spurred this work
Reduced the amount of code in the fallback kernels. At least on my M2 laptop, it runs faster - slightly faster (5-10%) for DD-type (i.e. small number-density) and significantly (~20-25%) faster for RR-type calculations