google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.
Apache License 2.0
5.9k stars 499 forks source link

Further 1.02x prefill speedup from batch 64->512 #308

Closed copybara-service[bot] closed 1 month ago

copybara-service[bot] commented 1 month ago

Further 1.02x prefill speedup from batch 64->512

Measured on SKX. Larger speedup expected for Zen4/SPR.