Add support for Google Gemma Model

shreyanshsaha commented 7 months ago

Description

There is a new model by google for text generation LLM called Gemma which is based on Gemini AI. https://ai.google.dev/gemma

The models are present on huggingface: https://huggingface.co/google/gemma-7b-it/tree/main

It would be nice if the tool can be updated to support this new model

rombodawg commented 7 months ago

I second this. We now have Exl2 quants and GGUF quants so we should have support for both in llamacpp-python as well as Exllama loaders

https://huggingface.co/models?sort=trending&search=LoneStriker+%2F+gemma

mclassen commented 7 months ago

I'm still getting errors with GGUF quants of Gemma

DerRehberg commented 7 months ago

Same

AndreyRGW commented 7 months ago

╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ F:\WBC\textwebui\server.py:241 in <module>                                                                           │
│                                                                                                                      │
│   240         # Load the model                                                                                       │
│ ❱ 241         shared.model, shared.tokenizer = load_model(model_name)                                                │
│   242         if shared.args.lora:                                                                                   │
│                                                                                                                      │
│ F:\WBC\textwebui\modules\models.py:87 in load_model                                                                  │
│                                                                                                                      │
│    86     shared.args.loader = loader                                                                                │
│ ❱  87     output = load_func_map[loader](model_name)                                                                 │
│    88     if type(output) is tuple:                                                                                  │
│                                                                                                                      │
│ F:\WBC\textwebui\modules\models.py:250 in llamacpp_loader                                                            │
│                                                                                                                      │
│   249     logger.info(f"llama.cpp weights detected: \"{model_file}\"")                                               │
│ ❱ 250     model, tokenizer = LlamaCppModel.from_pretrained(model_file)                                               │
│   251     return model, tokenizer                                                                                    │
│                                                                                                                      │
│ F:\WBC\textwebui\modules\llamacpp_model.py:102 in from_pretrained                                                    │
│                                                                                                                      │
│   101                                                                                                                │
│ ❱ 102         result.model = Llama(**params)                                                                         │
│   103         if cache_capacity > 0:                                                                                 │
│                                                                                                                      │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py:300 in __init__                       │
│                                                                                                                      │
│    299                                                                                                               │
│ ❱  300         self._model = _LlamaModel(                                                                            │
│    301             path_model=self.model_path, params=self.model_params, verbose=self.verbose                        │
│                                                                                                                      │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\_internals.py:50 in __init__                   │
│                                                                                                                      │
│    49                                                                                                                │
│ ❱  50         self.model = llama_cpp.llama_load_model_from_file(                                                     │
│    51             self.path_model.encode("utf-8"), self.params                                                       │
│                                                                                                                      │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama_cpp.py:728 in llama_load_model_from_file │
│                                                                                                                      │
│    727 ) -> llama_model_p:                                                                                           │
│ ❱  728     return _lib.llama_load_model_from_file(path_model, params)                                                │
│    729                                                                                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: exception: access violation reading 0x0000000000000000
Exception ignored in: <function LlamaCppModel.__del__ at 0x0000000032D5E7A0>
Traceback (most recent call last):
  File "F:\WBC\textwebui\modules\llamacpp_model.py", line 58, in __del__
    del self.model
        ^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'

LoneStriker/gemma-2b-GGUF

upd: working on dev branch

rombodawg commented 6 months ago

Now we just need to add support for finetuned/merged gemma models which arent working. Follow the mulit-thread. And check out my model for debugging.

Thread links: https://github.com/lmstudio-ai/configs/issues/21 https://github.com/ggerganov/llama.cpp/issues/5706 https://github.com/arcee-ai/mergekit/issues/181

Model: https://huggingface.co/rombodawg/Gemme-Merge-Test-7b

safadfadf commented 6 months ago

Same

wangfeng35 commented 6 months ago

also needs support for qwen1.5 models

shaktisd commented 6 months ago

any updates on this ?

TheOneTrueNiz commented 6 months ago

Wondering the same for Gemma-7B

ruizcrp commented 5 months ago

CodeGemma just out. Did anyone try already?

github-actions[bot] commented 3 months ago

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Mark-Tomlinson commented 1 month ago

Gemma2-2b-IT is out, and I'd love to try it. Any support for Gemma yet?

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma2'
llama_load_model_from_file: failed to load model
14:34:21-436521 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "D:\text-generation-webui\modules\ui_model_menu.py", line 231, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\modules\models.py", line 93, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\modules\models.py", line 274, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\modules\llamacpp_model.py", line 85, in from_pretrained
    result.model = Llama(**params)
                   ^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 323, in __init__
    self._model = _LlamaModel(
                  ^^^^^^^^^^^^
  File "D:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\_internals.py", line 55, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\gemma-2-2b-it-GGUF\gemma-2-2b-it-Q8_0.gguf

Exception ignored in: <function LlamaCppModel.__del__ at 0x000001C898492340>
Traceback (most recent call last):
  File "D:\text-generation-webui\modules\llamacpp_model.py", line 33, in __del__
    del self.model
        ^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'

TheOneTrueNiz commented 1 month ago

Use the transformers model loader. Gemma 2 27B loads and generates, just slow. I'm running dual 4090s. Roughly 90 seconds to generate and output a response.. Also, previous Gemma models load with the exLlamaV2_HF loader if anyone was curious.

oobabooga / text-generation-webui

Add support for Google Gemma Model #5562