Closed mcabbott closed 1 year ago
Okay, wow. How can it be that Julia compiled is now even faster than C/C++/Nim? :sweat_smile:
Thank you for your contribution @mcabbott :+1: Just stunned about the performance of Julia AOT compiled now. :laughing: But that shouldn't hold back merging, since it is all possible with the language.
I suspect the new code is easier to SIMD. Yes, other languages should be able to do the same.
Thanks, the numbers iook great!
Indeed, my interpretation is that SIMD is fine with accumulation pi +=
, but changing a second variable in the loop x = -x
blocks it.
This can also be done by just writing x = (-1)^iseven(i)
inside the loop, I wasn't sure whether that would count as cheating. I do think the vectorized R code arguably did something similar, by looking up one of two values for x
using mod(i, 2)
.
Side note
julia> function f(rounds)
pi = 0.0
@simd for i in (-2 * rounds + 1):4:(2 * rounds)
pi += 4 / i
end
return pi
end
f (generic function with 1 method)
is closer to what ~#45~ #49 did, and has same performance as this PR, without having to define a custom struct (but nothing wrong with that in principle)
Edit: or just
f(n) = sum(4 / i for i in (-2 * n + 1):4:(2 * n))
but this version doesn't SIMD very well.
I'm open to PRs. :slightly_smiling_face: I'm not the Julia expert here, so I have to rely on you all :sweat_smile:
Ah this is what R #49 did, not #45.
I'm a little surprised that works, as it adds all the negative numbers then all the positive numbers. At rounds = 10^9
the extrema of the positive are 2.000000002e-9, 2.0
, and eps(2.0) ≈ 4e-16
, yet the result has error 1e-9
.
Edit: as @Moelf points out, the R version need not suffer from this. Allocating an array means that sum
can do something smarter than just work left to right. But the loop above does not.
If I understand right, #45 changed the R implementation to store a vector for
x
, and access this at each iteration. Which is idiomatic R, instead of a loop.This does something similar for Julia, making a lazy vector for the
x
values, and accessing this within the loop. Julia has many such lazy array types, in fact the loop already readsi
from aUnitRange
, which is what:
creates:The
@simd
macro is also fairly standard on tight inner loops. Often the compiler can infer that the loop is safe, but for this particular function it does not, in local testing. The new one is both faster and very slightly more accurate, due to floating point differences: