Simplifying fallback kernels

manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.

https://corrfunc.readthedocs.io

MIT License

163 stars 50 forks source link

Simplifying fallback kernels #303

Closed manodeep closed 9 months ago

manodeep commented 10 months ago

Reduced the amount of code in the fallback kernels. At least on my M2 laptop, it runs faster - slightly faster (5-10%) for DD-type (i.e. small number-density) and significantly (~20-25%) faster for RR-type calculations

manodeep commented 10 months ago

@lgarrison Didn't realise I hadn't requested a review -- oops!

manodeep commented 10 months ago

Ohh forgot to mention that I ran the INTEGRATION_TESTS for this change and the exhaustive tests passed

manodeep commented 10 months ago

Didn't actually do a line-by-line comparison timer. Will attempt to do that on my laptop; plus, I will also check that the runtime is not adversely affected on our local linux supercomputer (Skylake and AMD EPYC)

manodeep commented 10 months ago

Timed the tests on the master and this branch on Skylake cpus - essentially no difference for theory but the mocks, specifically DDtheta is faster.

Timed the tests on master vs this branch on EPYC cpus - same as above difference in theory but slightly faster with the simplified kernels (but smaller improvements compared to SKX). In general, both branches run faster on EPYC compared to SKX.

manodeep commented 9 months ago

Totally forgot to merge this PR!

manodeep commented 2 months ago

Commenting to add the link to the original #296 that spurred this work