google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.
Apache License 2.0
5.8k stars 491 forks source link

Toward only using compressed weights: #214

Closed copybara-service[bot] closed 1 month ago

copybara-service[bot] commented 1 month ago

Toward only using compressed weights:

CompressedLayer should all be f32 when weights are f32.