google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.
Apache License 2.0
5.9k stars 499 forks source link

Major Prefill/Generate cleanup, 1.3x Prefill speedup #315

Closed copybara-service[bot] closed 1 month ago

copybara-service[bot] commented 1 month ago

Major Prefill/Generate cleanup, 1.3x Prefill speedup

This fixes TTFT, which was not including prefill.