oseiskar / simdkalman

Python Kalman filters vectorized as Single Instruction, Multiple Data
https://simdkalman.readthedocs.io/
MIT License
176 stars 36 forks source link

Code speedup due to faster einsum replacement #23

Closed winedarksea closed 12 months ago

winedarksea commented 1 year ago

I use your simdkalman in my AutoTS project, thanks for the work, it is excellent code. While running the scalene profiler on my code base it identified a possible performance improvement and suggested a speedup. The credit for this really goes to GPT v4.

I've tested the speedup and it is faster and functionally identical for my use case. I haven't necessarily tested this on your entire code bases range of inputs and outputs so there may be some hidden catch that I am unaware of.

simple speed comparison:

%timeit np.einsum("...ij,...kj->...ik", A, B)
31 ms ± 967 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit np.matmul(A, np.swapaxes(B, -1, -2))
7.61 ms ± 328 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

it's hard to quantify the bigger picture but it looks like it makes the function about 10% faster overall, in my use case

oseiskar commented 1 year ago

Thank you! This seems like a great performance tweak. I do not immediately see any reason to use einsum instead of swapaxes. Originally the design goal was simply ensuring that all relevant methods use vectorized Numpy operations, but not exploring very thoroughly, which Numpy operations would be the fastest after the first working one was found.

Speculating about the performance: It can be critical which BLAS/LAPACK operation this eventually translates to under the hood of Numpy - and if the answer is "none" that's bad too. The swapaxes version probably translates directly to a sequence of very efficient matrix multiplications where the LHS is stored in row-major and RHS in column-major order, while einsum could fall back to a generic C loop.

I will runs some tests and merge this if there are no issues. May take a week or two since I'm currently a bit swamped.

oseiskar commented 12 months ago

Worked fine. Merged and released in 1.0.3. Thanks again!