hpcgarage / spatter

Benchmark for measuring the performance of sparse and irregular memory access.
Other
71 stars 11 forks source link

CPU performance of new/old Spatter varies #189

Open jyoung3131 opened 2 months ago

jyoung3131 commented 2 months ago

We have noted that the refactor of Spatter #165 has inconsistent CPU performance for the serial and OpenMP backends.

With the addition of #165, we will have rough performance parity for Gather and MultiGather.

We need to ensure performance parity with v1.1 on CPU for the following kernels:

To replicate:

plavin commented 2 months ago

It turns out this issue was caused by compiling with CMAKE_BUILD_TYPE set to Debug instead of Release.

Performance on Skylake, 24 threads, with input, -pUNIFORM:8:1 -l$((2**24)), averaged over 10 runs:

Current:
Max: 92774.52, Mean: 88883.00, Stddev: 4764.36

Refactor:
Max: 92982.30, Mean: 84435.54, Stddev: 6467.54

The refactor does seem to have consistently lower mean and higher standard deviation across runs, though.

Once I have pushed my branch to the refactor branch we can close this issue. I just need to make sure I have properly implemented the "multiple target buffers" feature for all of the kernels.

jyoung3131 commented 1 month ago

From #165 note that we need to check Scatter, Multiscatter, Gather-Scatter tests.

jyoung3131 commented 2 weeks ago

Hi @radelja - can you please test out Scatter? We think there may be some overhead and slightly different instruction mix that may be coming from compiler optimization.