niklas-heer / speed-comparison

A repo which compares the speed of different programming languages.
https://niklas-heer.github.io/speed-comparison
MIT License
475 stars 76 forks source link

Use a lazy vector for Julia #52

Closed mcabbott closed 1 year ago

mcabbott commented 1 year ago

If I understand right, #45 changed the R implementation to store a vector for x, and access this at each iteration. Which is idiomatic R, instead of a loop.

This does something similar for Julia, making a lazy vector for the x values, and accessing this within the loop. Julia has many such lazy array types, in fact the loop already reads i from a UnitRange, which is what : creates:

julia> 2:(range + 2) isa AbstractVector
true

julia> 2:(range + 2) |> dump
UnitRange{Int64}
  start: Int64 2
  stop: Int64 1000002

julia> SignVector(range + 2) isa AbstractVector
true

julia> SignVector(3)
3-element SignVector:
  1.0
 -1.0
  1.0

The @simd macro is also fairly standard on tight inner loops. Often the compiler can infer that the loop is safe, but for this particular function it does not, in local testing. The new one is both faster and very slightly more accurate, due to floating point differences:

julia> using BenchmarkTools

julia> @btime f_prev(10^6) - pi
  1.552 ms (0 allocations: 0 bytes)
-9.999980186137236e-7

julia> @btime f(10^6) - pi
  531.375 μs (0 allocations: 0 bytes)
-9.999979067032427e-7
niklas-heer commented 1 year ago

Okay, wow. How can it be that Julia compiled is now even faster than C/C++/Nim? :sweat_smile: combined_results

niklas-heer commented 1 year ago

Thank you for your contribution @mcabbott :+1: Just stunned about the performance of Julia AOT compiled now. :laughing: But that shouldn't hold back merging, since it is all possible with the language.

giordano commented 1 year ago

I suspect the new code is easier to SIMD. Yes, other languages should be able to do the same.

mcabbott commented 1 year ago

Thanks, the numbers iook great!

Indeed, my interpretation is that SIMD is fine with accumulation pi +=, but changing a second variable in the loop x = -x blocks it.

This can also be done by just writing x = (-1)^iseven(i) inside the loop, I wasn't sure whether that would count as cheating. I do think the vectorized R code arguably did something similar, by looking up one of two values for x using mod(i, 2).

giordano commented 1 year ago

Side note

julia> function f(rounds)
           pi = 0.0

           @simd for i in (-2 * rounds + 1):4:(2 * rounds)
               pi += 4 / i
           end

           return pi
       end
f (generic function with 1 method)

is closer to what ~#45~ #49 did, and has same performance as this PR, without having to define a custom struct (but nothing wrong with that in principle)

Edit: or just

f(n) = sum(4 / i for i in (-2 * n + 1):4:(2 * n))

but this version doesn't SIMD very well.

niklas-heer commented 1 year ago

I'm open to PRs. :slightly_smiling_face: I'm not the Julia expert here, so I have to rely on you all :sweat_smile:

mcabbott commented 1 year ago

Ah this is what R #49 did, not #45.

I'm a little surprised that works, as it adds all the negative numbers then all the positive numbers. At rounds = 10^9 the extrema of the positive are 2.000000002e-9, 2.0, and eps(2.0) ≈ 4e-16, yet the result has error 1e-9.

Edit: as @Moelf points out, the R version need not suffer from this. Allocating an array means that sum can do something smarter than just work left to right. But the loop above does not.