lf-lang / benchmarks-lingua-franca

2 stars 4 forks source link

MathMul Savina Benchmark Validation Failed #63

Open tanneberger opened 7 months ago

tanneberger commented 7 months ago

C Target Savina Benchmark: ./C/Savina/src/parallism/MathMul.lf: Validation failed for (.., ..) ..

---- Start execution at time Sun Feb 25 15:03:29 2024
---- plus 779977198 nanoseconds
Environment 0: ---- Spawning 16 workers.
Benchmark: MatrixMultiplicationLFCBenchmark
System information
O/S Name: Linux
Validation failed for (i,j)=(1, 140) with (141400.000000, 143360.000000)
Iteration 1 - 161.031 ms
Validation failed for (i,j)=(1, 1) with (927.000000, 1024.000000)
Iteration 2 - 154.292 ms
Validation failed for (i,j)=(1, 1) with (1005.000000, 1024.000000)
Iteration 3 - 152.316 ms
Validation failed for (i,j)=(1, 1) with (909.000000, 1024.000000)
Iteration 4 - 151.190 ms
Validation failed for (i,j)=(1, 1) with (915.000000, 1024.000000)
Iteration 5 - 150.551 ms
Validation failed for (i,j)=(1, 10) with (9660.000000, 10240.000000)
Iteration 6 - 147.472 ms
Validation failed for (i,j)=(1, 8) with (7864.000000, 8192.000000)
Iteration 7 - 148.174 ms
Validation failed for (i,j)=(1, 1) with (896.000000, 1024.000000)
Iteration 8 - 150.427 ms
Validation failed for (i,j)=(1, 128) with (114688.000000, 131072.000000)
Iteration 9 - 151.051 ms
Validation failed for (i,j)=(1, 1) with (1012.000000, 1024.000000)
Iteration 10 - 150.549 ms
Validation failed for (i,j)=(1, 1) with (979.000000, 1024.000000)
Iteration 11 - 148.991 ms
Validation failed for (i,j)=(1, 77) with (77847.000000, 78848.000000)
Iteration 12 - 145.281 ms
Execution - Summary:
Best Time:  145.281 msec
Worst Time: 161.031 msec
Median Time:    150.488 msec
---- Terminating environment 0
---- Elapsed logical time (in nsec): 0
---- Elapsed physical time (in nsec): 1,812,513,944
cmnrd commented 6 months ago

That is actually the expected behavior :sob:. The original Savina implementation in Akka has the same issue. The benchmark performs concurrent matrix multiplication with concurrent writes on shared state... Inherently, there are race conditions and wrong numbers in the result matrix. We could fix it, but we prioritized modeling the same workloads to get comparable results over fixing the benchmarks.

I think the Rust version of this benchmark actually is deterministic and avoids the race condition (because Rust forces you), but also runs considerably slower. I guess now that comparing to actor frameworks isn't such a priority anymore, we could consider fixing and changing the benchmarks to our own needs.