1.15x 7b sfp prefill speedup: Matmul in attention

google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

Apache License 2.0

5.76k stars 487 forks source link

Closed copybara-service[bot] closed 2 weeks ago

copybara-service[bot] commented 3 weeks ago

1.15x 7b sfp prefill speedup: Matmul in attention 2b bf16: prefill 114.456 -> 115.222 decode 16.8847 -> 16.9987

7b sfp: prefill 18.8575 -> 21.7325 decode 5.68428 -> 5.79791