I just learned that using the @simd macro is necessary in some loops to achieve optimal performance. Consider the following example:
julia> function mysum_basic(a::Vector)
total = zero(eltype(a))
for x in a
total += x
end
return total
end
mysum_basic (generic function with 1 method)
julia> function mysum_simd(a::Vector)
total = zero(eltype(a))
@simd for x in a
total += x
end
return total
end
mysum_simd (generic function with 1 method)
julia> using BenchmarkTools
julia> rand_array_1D = rand(1000000)
julia> begin
@btime mysum_basic($rand_array_1D)
@btime mysum_simd($rand_array_1D)
end
814.200 μs (0 allocations: 0 bytes)
118.500 μs (0 allocations: 0 bytes)
This means that while the Julia compiler can automatically add some SIMD instructions, we still need to try @simd to make full use of the potential capabilities, if possible.
I just learned that using the
@simd
macro is necessary in some loops to achieve optimal performance. Consider the following example:8x difference!
The native code can be checked via
@code_native
:versus
This means that while the Julia compiler can automatically add some SIMD instructions, we still need to try
@simd
to make full use of the potential capabilities, if possible.