rreusser / summation-algorithms

Naive, pairwise recursive, pairwise non-recursive, and Kahan summation algorithms, compared
MIT License
7 stars 0 forks source link

use a larger base case for pairwise summation #3

Open stevengj opened 6 years ago

stevengj commented 6 years ago

The right way to implement pairwise summation, from a performance perspective, is simply to use a larger base case. Then you can still use explicit recursion, and the cost should be very similar to naive summation.

Around N=100–200 is usually sufficient to amortize the overhead of recursion, in my experience. For a sufficiently large array (a few thousand elements or more, IIRC), the error almost indistinguishable from that of a base case of N=1.

This is what was done in NumPy (numpy/numpy#3685) and Julia (JuliaLang/julia#4039) to achieve the accuracy of pairwise summation without a performance penalty.

("Coarsening" the base case is a standard technique to improve the performance of any recursive algorithm.)

rreusser commented 6 years ago

Thanks for the feedback! Yeah, I was poking around in the dark a bit here when I should have just been reading more code. That makes perfect sense. (I need to apply a base case to my gpu fft code… it's been eating away at me in the back of my head, but hard to find the time 😞) The other goal here was to really nail down empirically the numerical stability of the different approaches, though I didn't do a good job there either. 😞

TBH the real takeaway for a lot of this is that it's really difficult to write good numerical software in your spare time. If you're interested in people doing this right and not just cross-compiling into js, stdlib is the place to look. @kgryte is doing an impressive job of digging into existing implementations and tackling this like the large undertaking that it actually is. I believe he's currently rounding the corner on an ndarray implementation and slowly chipping away at blas (native bindings in node; wasm + idiomatic native js implementation for client-side).

rreusser commented 6 years ago

Ah, here's a better link: @stdlib/blas/base. Not sure plain summation is there yet, but this would definitely apply.