uasal / Gaussian-Beamlets

0 stars 0 forks source link

benchmarking exp() #1

Open douglase opened 4 years ago

douglase commented 4 years ago

the exponential function is the slowest aspect of the the beamlet propagation. There are several dimensions we could optimize over, propagations per second, propagations per watt, development time, etc... For now l'll focus on run-time.

douglase commented 4 years ago

for a 512x512x2000 complex128 test case, the default numexpr and numpy run times for the exp function are ~8 sec and 180 sec, on our test machine (AMD EPYC 7642 48-Core Processor).

Screen Shot 2020-10-24 at 3 31 27 PM
douglase commented 4 years ago

using Numba compiled for a GPU on a V100 GPU is faster, about 1.5 sec

Screen Shot 2020-10-24 at 3 32 35 PM
douglase commented 4 years ago

Numexpr; however, does not default to large thread counts.

If we increase the thread count to closer to the number of cores available on our machine (90 in this test case), we gain approximately another factor of 10 in runtime, bringing function to >0.2 sec, or almost 10^3 times faster than numpy on the same machine:

Screen Shot 2020-10-24 at 3 24 45 PM
douglase commented 4 years ago

looks like the next low hanging fruit is calls to np.reduce()

Screen Shot 2020-10-24 at 4 14 51 PM