Closed viv-eth closed 5 months ago
With what parameters do you observe these errors?
It's because of this:
https://github.com/pulp-platform/snitch_cluster/blob/d325300c8c2f6dd6b81b6612428e787adf133464/sw/blas/gemm/src/gemm.h#L1381-L1387
Matrix B needs to be transposed for SIMD kernels. Now it just immediately returns for all FP32 gemm kernels. It would be nice to have proper C assertions there instead of return -1
. I tried that once. I tried that once, but ran into some issues with newlib.
When used in the FlashAttention-2 layer the GEMM FP32 kernel yields wrong results by orders of magnitude