pulp-platform / snitch_cluster

An energy-efficient RISC-V floating-point compute cluster.
https://pulp-platform.github.io/snitch_cluster/
Apache License 2.0
51 stars 51 forks source link

Bug in GEMM FP32 baseline kernel #105

Closed viv-eth closed 5 months ago

viv-eth commented 8 months ago

When used in the FlashAttention-2 layer the GEMM FP32 kernel yields wrong results by orders of magnitude

colluca commented 8 months ago

With what parameters do you observe these errors?

fischeti commented 8 months ago

It's because of this: https://github.com/pulp-platform/snitch_cluster/blob/d325300c8c2f6dd6b81b6612428e787adf133464/sw/blas/gemm/src/gemm.h#L1381-L1387 Matrix B needs to be transposed for SIMD kernels. Now it just immediately returns for all FP32 gemm kernels. It would be nice to have proper C assertions there instead of return -1. I tried that once. I tried that once, but ran into some issues with newlib.