[Feature] Add support for Gemma LLMs

nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

https://nomic.ai/gpt4all

MIT License

70.3k stars 7.68k forks source link

[Feature] Add support for Gemma LLMs #1988

Closed flowstate247 closed 8 months ago

flowstate247 commented 8 months ago

Feature Request

Add support for Gemma LLMs: https://huggingface.co/blog/gemma

Motivation

Gemma is a family of 4 new LLM models by Google based on Gemini. It comes in two sizes: 2B and 7B parameters, each with base (pretrained) and instruction-tuned versions. All the variants can be run on various types of consumer hardware, even without quantization, and have a context length of 8K tokens.

Gemma 7B is a really strong model, with performance comparable to the best models in the 7B weight, including Mistral 7B. Gemma 2B is an interesting model for its size, but it doesn’t score as high in the leaderboard as the best capable models with a similar size, such as Phi 2.

flowstate247 commented 8 months ago

Benchmark_chart_new width-1000

flowstate247 commented 8 months ago

Benchmark Gemma

cebtenzzre commented 8 months ago

Implemented in #1992. I attempted to quantize it to Q4_0 with limited success, so only the official GGUF download from Google is available right now.

cebtenzzre commented 8 months ago

No success so far with any GGUF version of gemma-7b-it, converted from safetensors or quantized directly from Google's provided GGUF. Maybe someday there will be a usable instruct finetune.

pandruszkow commented 17 hours ago

I just tried loading the Gemma 2 models in gpt4all on Windows, and I was quite successful with both Gemma 2 2B and Gemma 2 9B instruct/chat tunes. I do not recall which fine-tunes I used, but both GGUF files were from the first page of Google search results and they both worked pretty well (other than <end_of_turn> showing up unnecessarily after the assistant turn - I'm guessing that's a GGUF metadata bug with the file I'm using).