judofyr / spice

Fine-grained parallelism with sub-nanosecond overhead in Zig
BSD Zero Clause License
748 stars 13 forks source link

Some improvements to the Rust version #2

Closed chancancode closed 3 months ago

chancancode commented 3 months ago

I did make some (lousy) effort to make sure none of these obviously caused any regression (it may be slightly faster?), but I was just running it on my laptop with a ton of browser tabs open, etc, etc. So probably should double check that in your benchmark environment.

judofyr commented 3 months ago

Oh nice! I was actually thinking of running it through clippy, but forgot about it. These changes look sensible to me. I’m a bit busy this week, but hopefully in a few days I’ll be able to re-do the benchmark in the same environment and verify that the results are similar.

judofyr commented 3 months ago

I ran this again now and then I get the following diff:

# Rerun benchmark
(cd examples/rust-parallel-example && cargo bench)

# Regenerate CSV
rm bench/rayon-*.csv && make bench-rayon

# Diff
git diff
diff --git a/bench/rayon-tree-sum-1000.csv b/bench/rayon-tree-sum-1000.csv
index d9c615a..410a04e 100644
--- a/bench/rayon-tree-sum-1000.csv
+++ b/bench/rayon-tree-sum-1000.csv
@@ -1,7 +1,7 @@
-Baseline,1.5597891883229127
-Rayon 1 thread,23.51176644639185
-Rayon 2 threads,16.82203207978065
-Rayon 4 threads,14.939625885334602
-Rayon 8 threads,18.66874537774096
-Rayon 16 threads,24.183746144849145
-Rayon 32 threads,105.44453796411226
+Baseline,1.3574152155604338
+Rayon 1 thread,20.265803978522356
+Rayon 2 threads,15.612293062453892
+Rayon 4 threads,14.348052230794815
+Rayon 8 threads,16.42391594539397
+Rayon 16 threads,21.837912138899224
+Rayon 32 threads,96.13910653253916
diff --git a/bench/rayon-tree-sum-100M.csv b/bench/rayon-tree-sum-100M.csv
index b94de87..ef25279 100644
--- a/bench/rayon-tree-sum-100M.csv
+++ b/bench/rayon-tree-sum-100M.csv
@@ -1,7 +1,7 @@
-Baseline,7.4849190632000004
-Rayon 1 thread,22.9933083354
-Rayon 2 threads,11.7585642422
-Rayon 4 threads,5.898856567
-Rayon 8 threads,2.9989583844
-Rayon 16 threads,1.6433929402
-Rayon 32 threads,1.651134205
+Baseline,7.1566549926
+Rayon 1 thread,20.2334262024
+Rayon 2 threads,10.2145197118
+Rayon 4 threads,5.2211526282
+Rayon 8 threads,2.6398534263999998
+Rayon 16 threads,1.4336398069999998
+Rayon 32 threads,1.444313771

The machine appears to be slightly faster today (wonders of the cloud, maybe?), but the overall shape of the benchmark results still apply in this PR. Let's merge this!