Open narikm opened 5 months ago
I have the same problem
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek2'
llama_load_model_from_file: failed to load model
17:11:31-127820 ERROR Failed to load the model.
Traceback (most recent call last):
File "D:\Programs\text-generation-webui\modules\ui_model_menu.py", line 244, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Programs\text-generation-webui\modules\models.py", line 93, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Programs\text-generation-webui\modules\models.py", line 271, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Programs\text-generation-webui\modules\llamacpp_model.py", line 103, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "D:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 323, in __init__
self._model = _LlamaModel(
^^^^^^^^^^^^
File "D:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\_internals.py", line 55, in __init__
raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf
Exception ignored in: <function LlamaCppModel.__del__ at 0x0000014B33030CC0>
Traceback (most recent call last):
File "D:\Programs\text-generation-webui\modules\llamacpp_model.py", line 58, in __del__
del self.model
^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
first, update your llama_cpp_python from : https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.78+cpuavx2-cp311-cp311-linux_x86_64.whl
second, check your error report, if is 'key not found' error, you may add
deepseek2.attention.q_lora_rank=int:1536 deepseek2.attention.kv_lora_rank=int:512 deepseek2.expert_shared_count=int:2 deepseek2.expert_feed_forward_length=int:1536 deepseek2.expert_weights_scale=float:16 deepseek2.leading_dense_block_count=int:1 deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
as param kv_override in ./modules/llamacpp_model.py in dict LLamaCppModel.from_pretrained.param
by the way ,i have already ran deepseek-v2-chat-Q2_K model successful
For me it's giving this error trying to load any of the GGUFs I tried:
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek2'
edit: Oh, and I only tried the "Lite" variants. I'm not sure my machine can handle the full size version.
For me it's giving this error trying to load any of the GGUFs I tried:
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek2'
edit: Oh, and I only tried the "Lite" variants. I'm not sure my machine can handle the full size version.
update your llama_cpp_python from : https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.78+cpuavx2-cp311-cp311-linux_x86_64.whl
17:34:05-126060 ERROR Failed to load the model.
Traceback (most recent call last):
File "N:\text-generation-webui\modules\ui_model_menu.py", line 244, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "N:\text-generation-webui\modules\models.py", line 93, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "N:\text-generation-webui\modules\models.py", line 271, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "N:\text-generation-webui\modules\llamacpp_model.py", line 103, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "N:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 338, in __init__
self._model = _LlamaModel(
^^^^^^^^^^^^
File "N:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\_internals.py", line 57, in __init__
raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\DeepSeek-Coder-V2-Lite-Instruct-Q6_K.gguf
Exception ignored in: <function LlamaCppModel.__del__ at 0x0000023141898EA0>
Traceback (most recent call last):
File "N:\text-generation-webui\modules\llamacpp_model.py", line 58, in __del__
del self.model
^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
For me it's giving this error trying to load any of the GGUFs I tried:
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek2'
edit: Oh, and I only tried the "Lite" variants. I'm not sure my machine can handle the full size version.update your llama_cpp_python from : https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.78+cpuavx2-cp311-cp311-linux_x86_64.whl
Why isn't that taken care of with the update script?
Yeah, noticing that error was more than surprise after spending half day for only obtaining near 200Gb parts with joining on external drives. I updated launcher through internal mechanism and it wasn't working. Ok, time to reinstall again.
Update: Full reinstall helps (don't forget to save your chats archive). BTW Deepseek2-Base-236billions in quality Q2_0 uses ~87Gb RAM on CPU, hallucinating a lot, for coding preferred better quality (and more RAM). Reducing GPU layers helping a lot to launch (weird bug that if not enough VRAM on GPU it decline to start even with plenty free RAM).
I hit exactly the same error when trying to run https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-IQ4_XS
To make sure I have everything up to date, I tried ./update_wizard_linux.sh but it did not fix it.
I tested with llama.cpp directly (without text-generation-webui), and it worked without the error. Hopefully, this issue can be fixed in text-generation-webui, but until then using llama.cpp for the DeepSeek model is a possible workaround.
Describe the bug
The software refuse to load the quant of DeepSeek-Coder-V2-Instruct.
Is there an existing issue for this?
Reproduction
Trying to load the model using the latest version.
Screenshot
No response
Logs
System Info