Closed Vadimluck closed 1 year ago
The model is using an older version of GGML, there was an update and only GGMLv3 works now. #2264
Thank you very much! My problem was solved as soon as I downloaded q5, before that I was using q8.
More recently (March 2024), I've been seeing this error come up again. This time, it's due to being out of VRAM. Try sliding the n-gpu-layers slider down to 1 and reloading, and if that works, then see how far back up you can put it without the error message reappearing.
I think that something has started using more VRAM, as I can't set the n-gpu-layers value as high as I think I used to.
More recently (March 2024), I've been seeing this error come up again. This time, it's due to being out of VRAM. Try sliding the n-gpu-layers slider down to 1 and reloading, and if that works, then see how far back up you can put it without the error message reappearing.
I think that something has started using more VRAM, as I can't set the n-gpu-layers value as high as I think I used to.
Having this issue with CohereForAI/c4ai-command-r-v01-4bit models on a 3090, with quantizations ~20 gig file size. I can load other models which are larger fine, and setting the GPU layers to 1 still results in this error. Not sure how to proceed.
Just received this error in attempting to load the IQ4XS quant from https://huggingface.co/MaziyarPanahi/WizardLM-2-8x22B-GGUF/tree/main I have 3 P40s running concurrently, I am using the latest release of text-gen-webui using the built-in update script.
When sliding the 'n-gpu-layers' to 1, the model loads successfully. Set to max layers, it fails. Set to 40 layers, it fails. Set to 30 layers, it succeeds Set to 20 layers, it succeeds.
Error log:
Traceback (most recent call last):
File "/home/ml-user-1/LLMs/Runners/text-generation-webui/modules/ui_model_menu.py", line 249, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ml-user-1/LLMs/Runners/text-generation-webui/modules/models.py", line 94, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ml-user-1/LLMs/Runners/text-generation-webui/modules/models.py", line 272, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ml-user-1/LLMs/Runners/text-generation-webui/modules/llamacpp_model.py", line 103, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "/home/ml-user-1/LLMs/Runners/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda/llama.py", line 352, in __init__
self._ctx = _LlamaContext(
^^^^^^^^^^^^^^
File "/home/ml-user-1/LLMs/Runners/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda/_internals.py", line 267, in __init__
raise ValueError("Failed to create llama_context")
ValueError: Failed to create llama_context
Exception ignored in: <function LlamaCppModel.__del__ at 0x7f5aa5b00b80>
Traceback (most recent call last):
File "/home/ml-user-1/LLMs/Runners/text-generation-webui/modules/llamacpp_model.py", line 58, in __del__
del self.model
^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
System info:
NAME="Debian GNU/Linux"
VERSION="12 (bookworm)"
Linux ML-Host-1 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux
model name : Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
192 GB ram
nvidia-smi
| 0 Tesla P40 On | 00000000:03:00.0 Off | Off |
| 1 Tesla P40 On | 00000000:04:00.0 Off | Off |
| 2 Tesla P40 On | 00000000:A1:00.0 Off | Off |
lower your context length "n_ctx" try 4096
lower your context length "n_ctx" try 4096
Thank you, this advice has helped me!
Describe the bug
Installed "text-generation-webui" and "vicuna-13b-cocktail" worked, but some others didn't want to work and I decided to (reinstall) uninstall and install from scratch "text-generation-webui". Now "vicuna-13b-cocktail" stopped working. An error appears in the console:
Is there an existing issue for this?
Reproduction
Screenshot
Logs
System Info