Open CuriousCat-7 opened 2 years ago
You can fuse sqrt cos sin in a single pass over the data
import nimpy
import times
import arraymancer
var
tic, toc: float
# for math
let np = pyImport("numpy")
tic = epochTime()
for i in 0..<200:
discard np.sqrt(np.cos(np.sin(np.linspace(0, 10, 1000))))
toc = epochTime()
echo "np time: ", toc - tic
tic = epochTime()
for i in 0..<200:
discard sqrt(cos(sin(arraymancer.linspace(0, 10, 1000))))
toc = epochTime()
echo "arraymancer time: ", toc - tic
tic = epochTime()
for i in 0..<200:
var t = arraymancer.linspace(0, 10, 1000)
t.apply_inline():
x.sin().cos().sqrt()
toc = epochTime()
echo "arraymancer fused time: ", toc - tic
$ nim c -d:danger --hints:off --warnings:off -d:danger -r --outdir:build build/speedtest.nim
np time: 0.009390830993652344
arraymancer time: 0.005604982376098633
arraymancer fused time: 0.004479646682739258
Depending on the number of cores you have, using -d:openmp
might also accelerate. I have 36 cores unfortunately and OpenMP doesn't deal with contention that well with the unfused code (not enough work per item).
$ nim c -d:openmp --hints:off --warnings:off -d:danger -r --outdir:build build/speedtest.nim
np time: 0.009420156478881836
arraymancer time: 0.04207587242126465
arraymancer fused time: 0.005712270736694336
Note: for benchmarking CPU time might give you the wrong figures with parallel code that involves multiple CPUs.
Shell and output:
If it is compiled with release
I get time:
Could I improve the speed further?