Bug in GEMM FP32 baseline kernel

pulp-platform / snitch_cluster

An energy-efficient RISC-V floating-point compute cluster.

Apache License 2.0

51 stars 51 forks source link

Closed viv-eth closed 5 months ago

viv-eth commented 8 months ago

When used in the FlashAttention-2 layer the GEMM FP32 kernel yields wrong results by orders of magnitude

colluca commented 8 months ago

With what parameters do you observe these errors?

fischeti commented 8 months ago

It's because of this: https://github.com/pulp-platform/snitch_cluster/blob/d325300c8c2f6dd6b81b6612428e787adf133464/sw/blas/gemm/src/gemm.h#L1381-L1387 Matrix B needs to be transposed for SIMD kernels. Now it just immediately returns for all FP32 gemm kernels. It would be nice to have proper C assertions there instead of return -1. I tried that once. I tried that once, but ran into some issues with newlib.