lindahua / Devectorize.jl

A Julia framework for delayed expression evaluation
MIT License
115 stars 17 forks source link

Timing `axpy` #37

Open dpo opened 9 years ago

dpo commented 9 years ago

I wrote a simple function to compare the runtime of a variety of ways to perform x = x + α * y in Julia. I was under the impression that @devec should produce a timing close to that of the explicit loop, or perhaps even recognize that a BLAS call is applicable, but it turns out that it's even slower than the plain x = x + α * y. Am I doing something wrong?

Here is the script: https://gist.github.com/8c5da6abd585f7f07da6 And here is the output:

julia> time_axpy(10000000, 3)
n = 10000000, nloops = 3
Explicit loop:
elapsed time: 0.03996825 seconds (0 bytes allocated)
x = x + αy
elapsed time: 0.139159466 seconds (480000648 bytes allocated, 19.77% gc time)
x += αy
elapsed time: 0.248638223 seconds (480000648 bytes allocated, 49.02% gc time)
x[:] += αy
elapsed time: 0.293941227 seconds (720000792 bytes allocated, 39.00% gc time)
x .+= αy
elapsed time: 0.173867671 seconds (480001680 bytes allocated, 32.65% gc time)
x[:] .+= αy
elapsed time: 0.372861997 seconds (720001824 bytes allocated, 8.15% gc time)
BLAS call
elapsed time: 0.035336643 seconds (0 bytes allocated)
@devec x = x + α * y
elapsed time: 0.190846651 seconds (240000744 bytes allocated)

I tried placing the @devec test in a separate script. For some reason, the timings are better but there's a substantial amount of garbage collection:

julia> time_axpy(10000000, 3)
n = 10000000, nloops = 3
@devec x = x + α * y
elapsed time: 0.114310391 seconds (240000744 bytes allocated, 22.96% gc time)

Thanks.