mratsim / weave

A state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead
Other
537 stars 22 forks source link

sync bug in matmul test #115

Open mratsim opened 4 years ago

mratsim commented 4 years ago

It seems like once every couple hundreds time on the size 700x2 * 2x37 -> 700x37 the result matrix is not properly updated.

image

mratsim commented 4 years ago

In CI

https://dev.azure.com/numforge/Weave/_build/results?buildId=496&view=logs&jobId=5336da5d-2772-5db2-cb26-68743e5bcd30&j=5336da5d-2772-5db2-cb26-68743e5bcd30&t=c111748d-834b-55bf-d025-f190bfb75559

Test [129x37] * [37x37] -> [129x37]
  Mean Relative Error of Weave vs reference: 5.453971851920869e-08
  Mean Relative Error of Weave (nestable) vs reference: 5.453971851920869e-08
Test [129x37] * [37x129] -> [129x129]
  Mean Relative Error of Weave vs reference: 5.962600191367073e-09
  Mean Relative Error of Weave (nestable) vs reference: 5.962600191367073e-09
Test [129x37] * [37x700] -> [129x700]
  Mean Relative Error of Weave vs reference: 1.0
fatal.nim(49)            sysFatal
Error: unhandled exception: test_gemm_output.nim(72, 12) `weaveError <= 0.0001'f32` 1.0 [AssertionError]
Error: execution of an external program failed: '/Users/runner/runners/2.166.4/work/1/s/build/test_gemm_output '
stack trace: (most recent call last)
/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/nimblecache/nimscriptapi.nim(165, 16)
/Users/runner/runners/2.166.4/work/1/s/weave_2411.nims(94, 12) testTask
/Users/runner/runners/2.166.4/work/1/s/weave_2411.nims(32, 8) test
/Users/runner/runners/2.166.4/work/1/s/NimBinaries/nim-devel/lib/system/nimscript.nim(260, 7) exec
/Users/runner/runners/2.166.4/work/1/s/NimBinaries/nim-devel/lib/system/nimscript.nim(260, 7) Error: unhandled exception: FAILED: nim c -d:danger --verbosity:0 --hints:off --warnings:off --threads:on -d:release --outdir:build -r benchmarks/matmul_gemm_blas/test_gemm_output.nim [OSError]
       Tip: 1 messages have been suppressed, use --verbose to show them.
     Error: Exception raised during nimble script execution

https://dev.azure.com/numforge/Weave/_build/results?buildId=496&view=logs&jobId=5336da5d-2772-5db2-cb26-68743e5bcd30&j=0795a493-e5ca-56c8-45b7-d83e4a06a826&t=471fadfe-288e-54e3-3706-1fa6bf209b80

========================================================================================
Running [ c -d:WV_LazyFlowvar ] benchmarks/matmul_gemm_blas/gemm_pure_nim/gemm_weave.nim
========================================================================================
fatal.nim(49)            sysFatal
Error: unhandled exception: gemm_weave.nim(294, 14) `res_ab == ab` [[0.0, 0.0], [0.0, 0.0], [0.0, 0.0]] [AssertionError]
Error: execution of an external program failed: '/home/vsts/work/1/s/build/gemm_weave '
stack trace: (most recent call last)
/tmp/nimblecache/nimscriptapi.nim(165, 16)
/home/vsts/work/1/s/weave_10763.nims(84, 12) testTask
/home/vsts/work/1/s/weave_10763.nims(32, 8) test
/home/vsts/work/1/s/NimBinaries/nim-devel/lib/system/nimscript.nim(260, 7) exec
/home/vsts/work/1/s/NimBinaries/nim-devel/lib/system/nimscript.nim(260, 7) Error: unhandled exception: FAILED: nim c -d:WV_LazyFlowvar --verbosity:0 --hints:off --warnings:off --threads:on -d:release --outdir:build -r benchmarks/matmul_gemm_blas/gemm_pure_nim/gemm_weave.nim [OSError]
       Tip: 1 messages have been suppressed, use --verbose to show them.
     Error: Exception raised during nimble script execution
mratsim commented 4 years ago

I suspect it's #97 since it happens for the non-nestable syncRoot