Closed ivan-pi closed 1 year ago
Wow, this is awesome, thank you for sharing it!
I will merge that immediately. I wonder what will happen if we then replace the loop with an ad-hoc implementation for each polynomial degree like: (((((a5*x+a4)*x+a3)*x+a2)*x+a1)*x+a0)
Yesterday I didn't bother to check, but it would be good to look at the output of -fopt-info
and -fopt-info-missed
to see what optimizations have been applied/missed.
Compared to the function call, an inline Horner statement has the advantage it makes the job of an optimizing compiler easier. For high polynomial degrees, it's sometimes better to split the polynomial into multiple parts which can be evaluated in parallel by a SIMD unit (see https://en.wikipedia.org/wiki/Horner%27s_method#Parallel_evaluation), but I've never tested this.
With this modification, the new approximation appears to be faster than the intrinsic and equally precise.
I've also fixed an error introduced in #1 on L216 of
bessel_constants.f90
.