google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.
Apache License 2.0
5.96k stars 506 forks source link

Simplify matmul: only 2 overloads #304

Closed copybara-service[bot] closed 3 months ago

copybara-service[bot] commented 3 months ago

Simplify matmul: only 2 overloads

Also add StoreHorizontalSumsMaybeAdd wrapper function, move MatMulSlowBatch into test.

1.02-1.06x speedup.