Closed shreyanshsaha closed 3 months ago
I second this. We now have Exl2 quants and GGUF quants so we should have support for both in llamacpp-python as well as Exllama loaders
https://huggingface.co/models?sort=trending&search=LoneStriker+%2F+gemma
I'm still getting errors with GGUF quants of Gemma
Same
╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ F:\WBC\textwebui\server.py:241 in <module> │
│ │
│ 240 # Load the model │
│ ❱ 241 shared.model, shared.tokenizer = load_model(model_name) │
│ 242 if shared.args.lora: │
│ │
│ F:\WBC\textwebui\modules\models.py:87 in load_model │
│ │
│ 86 shared.args.loader = loader │
│ ❱ 87 output = load_func_map[loader](model_name) │
│ 88 if type(output) is tuple: │
│ │
│ F:\WBC\textwebui\modules\models.py:250 in llamacpp_loader │
│ │
│ 249 logger.info(f"llama.cpp weights detected: \"{model_file}\"") │
│ ❱ 250 model, tokenizer = LlamaCppModel.from_pretrained(model_file) │
│ 251 return model, tokenizer │
│ │
│ F:\WBC\textwebui\modules\llamacpp_model.py:102 in from_pretrained │
│ │
│ 101 │
│ ❱ 102 result.model = Llama(**params) │
│ 103 if cache_capacity > 0: │
│ │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py:300 in __init__ │
│ │
│ 299 │
│ ❱ 300 self._model = _LlamaModel( │
│ 301 path_model=self.model_path, params=self.model_params, verbose=self.verbose │
│ │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\_internals.py:50 in __init__ │
│ │
│ 49 │
│ ❱ 50 self.model = llama_cpp.llama_load_model_from_file( │
│ 51 self.path_model.encode("utf-8"), self.params │
│ │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama_cpp.py:728 in llama_load_model_from_file │
│ │
│ 727 ) -> llama_model_p: │
│ ❱ 728 return _lib.llama_load_model_from_file(path_model, params) │
│ 729 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: exception: access violation reading 0x0000000000000000
Exception ignored in: <function LlamaCppModel.__del__ at 0x0000000032D5E7A0>
Traceback (most recent call last):
File "F:\WBC\textwebui\modules\llamacpp_model.py", line 58, in __del__
del self.model
^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
LoneStriker/gemma-2b-GGUF
upd: working on dev branch
Now we just need to add support for finetuned/merged gemma models which arent working. Follow the mulit-thread. And check out my model for debugging.
Thread links: https://github.com/lmstudio-ai/configs/issues/21 https://github.com/ggerganov/llama.cpp/issues/5706 https://github.com/arcee-ai/mergekit/issues/181
Same
also needs support for qwen1.5 models
any updates on this ?
Wondering the same for Gemma-7B
CodeGemma just out. Did anyone try already?
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Gemma2-2b-IT is out, and I'd love to try it. Any support for Gemma yet?
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma2'
llama_load_model_from_file: failed to load model
14:34:21-436521 ERROR Failed to load the model.
Traceback (most recent call last):
File "D:\text-generation-webui\modules\ui_model_menu.py", line 231, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\modules\models.py", line 93, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\modules\models.py", line 274, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\modules\llamacpp_model.py", line 85, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 323, in __init__
self._model = _LlamaModel(
^^^^^^^^^^^^
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\_internals.py", line 55, in __init__
raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\gemma-2-2b-it-GGUF\gemma-2-2b-it-Q8_0.gguf
Exception ignored in: <function LlamaCppModel.__del__ at 0x000001C898492340>
Traceback (most recent call last):
File "D:\text-generation-webui\modules\llamacpp_model.py", line 33, in __del__
del self.model
^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
Use the transformers model loader. Gemma 2 27B loads and generates, just slow. I'm running dual 4090s. Roughly 90 seconds to generate and output a response.. Also, previous Gemma models load with the exLlamaV2_HF loader if anyone was curious.
Description
There is a new model by google for text generation LLM called Gemma which is based on Gemini AI. https://ai.google.dev/gemma
The models are present on huggingface: https://huggingface.co/google/gemma-7b-it/tree/main
It would be nice if the tool can be updated to support this new model