Open stevengj opened 7 years ago
Thanks for the feedback! Yeah, I was poking around in the dark a bit here when I should have just been reading more code. That makes perfect sense. (I need to apply a base case to my gpu fft codeβ¦ it's been eating away at me in the back of my head, but hard to find the time π) The other goal here was to really nail down empirically the numerical stability of the different approaches, though I didn't do a good job there either. π
TBH the real takeaway for a lot of this is that it's really difficult to write good numerical software in your spare time. If you're interested in people doing this right and not just cross-compiling into js, stdlib is the place to look. @kgryte is doing an impressive job of digging into existing implementations and tackling this like the large undertaking that it actually is. I believe he's currently rounding the corner on an ndarray implementation and slowly chipping away at blas (native bindings in node; wasm + idiomatic native js implementation for client-side).
Ah, here's a better link: @stdlib/blas/base. Not sure plain summation is there yet, but this would definitely apply.
The right way to implement pairwise summation, from a performance perspective, is simply to use a larger base case. Then you can still use explicit recursion, and the cost should be very similar to naive summation.
Around N=100β200 is usually sufficient to amortize the overhead of recursion, in my experience. For a sufficiently large array (a few thousand elements or more, IIRC), the error almost indistinguishable from that of a base case of N=1.
This is what was done in NumPy (numpy/numpy#3685) and Julia (JuliaLang/julia#4039) to achieve the accuracy of pairwise summation without a performance penalty.
("Coarsening" the base case is a standard technique to improve the performance of any recursive algorithm.)