google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.
Apache License 2.0
5.96k stars 506 forks source link

2x speedup of SFP decode (1.4x overall) on AVX3_DL+. #178

Closed copybara-service[bot] closed 5 months ago

copybara-service[bot] commented 5 months ago

2x speedup of SFP decode (1.4x overall) on AVX3_DL+. Thanks @nzmichaelh for suggesting table lookups!