google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.
Apache License 2.0
5.76k stars 487 forks source link

Fix KV cache size calculation error #266

Closed ufownl closed 2 weeks ago

ufownl commented 2 weeks ago

After refactoring the KV cache size calculation, the wrong functor was used to calculate the size of KV cache. It leads to KV cache buffer overflow.