Closed flowstate247 closed 8 months ago
Implemented in #1992. I attempted to quantize it to Q4_0 with limited success, so only the official GGUF download from Google is available right now.
No success so far with any GGUF version of gemma-7b-it, converted from safetensors or quantized directly from Google's provided GGUF. Maybe someday there will be a usable instruct finetune.
I just tried loading the Gemma 2 models in gpt4all on Windows, and I was quite successful with both Gemma 2 2B and Gemma 2 9B instruct/chat tunes. I do not recall which fine-tunes I used, but both GGUF files were from the first page of Google search results and they both worked pretty well (other than <end_of_turn>
showing up unnecessarily after the assistant turn - I'm guessing that's a GGUF metadata bug with the file I'm using).
Feature Request
Add support for Gemma LLMs: https://huggingface.co/blog/gemma
Motivation
Gemma is a family of 4 new LLM models by Google based on Gemini. It comes in two sizes: 2B and 7B parameters, each with base (pretrained) and instruction-tuned versions. All the variants can be run on various types of consumer hardware, even without quantization, and have a context length of 8K tokens.
Gemma 7B is a really strong model, with performance comparable to the best models in the 7B weight, including Mistral 7B. Gemma 2B is an interesting model for its size, but it doesn’t score as high in the leaderboard as the best capable models with a similar size, such as Phi 2.