Closed francescoalemanno closed 1 year ago
@francescoalemanno thank you for your contribution! :)
@niklas-heer actually it was a bit too rush to close this pull-request before checking that it actually worked... On my laptop it made the code run ~ 3 times as fast:
(base) ➜ hyperfine ./leibniz_orig
Benchmark 1: ./leibniz_orig
Time (mean ± σ): 95.9 ms ± 0.7 ms [User: 95.0 ms, System: 0.6 ms]
Range (min … max): 95.4 ms … 98.9 ms 30 runs
(base) ➜ hyperfine ./leibniz
Benchmark 1: ./leibniz
Time (mean ± σ): 32.7 ms ± 0.4 ms [User: 32.1 ms, System: 0.3 ms]
Range (min … max): 32.3 ms … 34.7 ms 82 runs
but this does not show up in the CI runs... I wanted to investigate why... Okay I guess
it still evaluates all the terms, one by one. But it does so in a way that should lead to vector assembly code