Simplify FFW by using MatMul_4x4_Batch_Add.

google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

Apache License 2.0

5.9k stars 499 forks source link

Closed copybara-service[bot] closed 1 month ago

copybara-service[bot] commented 1 month ago

Simplify FFW by using MatMul_4x4_Batch_Add. Affects only the griffin model, where prefill TPS improves by about 70%.