oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
39.19k stars 5.17k forks source link

Impossible to load DeepSeek-Coder-V2-Instruct.gguf #6144

Open narikm opened 2 months ago

narikm commented 2 months ago

Describe the bug

The software refuse to load the quant of DeepSeek-Coder-V2-Instruct.

Is there an existing issue for this?

Reproduction

Trying to load the model using the latest version.

Screenshot

No response

Logs

File "G:\SD\text-generation-webui\modules\ui_model_menu.py", line 244, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\SD\text-generation-webui\modules\models.py", line 93, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\SD\text-generation-webui\modules\models.py", line 271, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\SD\text-generation-webui\modules\llamacpp_model.py", line 103, in from_pretrained
    result.model = Llama(**params)
                   ^^^^^^^^^^^^^^^
  File "G:\SD\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp\llama.py", line 338, in __init__
    self._model = _LlamaModel(
                  ^^^^^^^^^^^^
  File "G:\SD\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp\_internals.py", line 57, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\DeepSeek-Coder-V2-Instruct.i1-Q4_K_S.gguf

Exception ignored in: <function LlamaCppModel.__del__ at 0x0000018F47AD6AC0>
Traceback (most recent call last):
  File "G:\SD\text-generation-webui\modules\llamacpp_model.py", line 58, in __del__
    del self.model
        ^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'

System Info

windows 11.
gmikhail commented 2 months ago

I have the same problem

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek2'
llama_load_model_from_file: failed to load model
17:11:31-127820 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "D:\Programs\text-generation-webui\modules\ui_model_menu.py", line 244, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\text-generation-webui\modules\models.py", line 93, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\text-generation-webui\modules\models.py", line 271, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\text-generation-webui\modules\llamacpp_model.py", line 103, in from_pretrained
    result.model = Llama(**params)
                   ^^^^^^^^^^^^^^^
  File "D:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 323, in __init__
    self._model = _LlamaModel(
                  ^^^^^^^^^^^^
  File "D:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\_internals.py", line 55, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf

Exception ignored in: <function LlamaCppModel.__del__ at 0x0000014B33030CC0>
Traceback (most recent call last):
  File "D:\Programs\text-generation-webui\modules\llamacpp_model.py", line 58, in __del__
    del self.model
        ^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
YK-hastur commented 2 months ago

first, update your llama_cpp_python from : https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.78+cpuavx2-cp311-cp311-linux_x86_64.whl

second, check your error report, if is 'key not found' error, you may add

deepseek2.attention.q_lora_rank=int:1536 deepseek2.attention.kv_lora_rank=int:512 deepseek2.expert_shared_count=int:2 deepseek2.expert_feed_forward_length=int:1536 deepseek2.expert_weights_scale=float:16 deepseek2.leading_dense_block_count=int:1 deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707

as param kv_override in ./modules/llamacpp_model.py in dict LLamaCppModel.from_pretrained.param

YK-hastur commented 2 months ago

by the way ,i have already ran deepseek-v2-chat-Q2_K model successful

TiagoTiago commented 2 months ago

For me it's giving this error trying to load any of the GGUFs I tried: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek2'

edit: Oh, and I only tried the "Lite" variants. I'm not sure my machine can handle the full size version.

YK-hastur commented 2 months ago

For me it's giving this error trying to load any of the GGUFs I tried: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek2'

edit: Oh, and I only tried the "Lite" variants. I'm not sure my machine can handle the full size version.

update your llama_cpp_python from : https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.78+cpuavx2-cp311-cp311-linux_x86_64.whl

FartyPants commented 2 months ago
17:34:05-126060 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "N:\text-generation-webui\modules\ui_model_menu.py", line 244, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "N:\text-generation-webui\modules\models.py", line 93, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "N:\text-generation-webui\modules\models.py", line 271, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "N:\text-generation-webui\modules\llamacpp_model.py", line 103, in from_pretrained
    result.model = Llama(**params)
                   ^^^^^^^^^^^^^^^
  File "N:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 338, in __init__
    self._model = _LlamaModel(
                  ^^^^^^^^^^^^
  File "N:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\_internals.py", line 57, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\DeepSeek-Coder-V2-Lite-Instruct-Q6_K.gguf

Exception ignored in: <function LlamaCppModel.__del__ at 0x0000023141898EA0>
Traceback (most recent call last):
  File "N:\text-generation-webui\modules\llamacpp_model.py", line 58, in __del__
    del self.model
        ^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
TiagoTiago commented 2 months ago

For me it's giving this error trying to load any of the GGUFs I tried: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek2' edit: Oh, and I only tried the "Lite" variants. I'm not sure my machine can handle the full size version.

update your llama_cpp_python from : https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.78+cu121-cp311-cp311-linux_x86_64.whl https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.78+cpuavx2-cp311-cp311-linux_x86_64.whl

Why isn't that taken care of with the update script?

clover1980 commented 2 months ago

Yeah, noticing that error was more than surprise after spending half day for only obtaining near 200Gb parts with joining on external drives. I updated launcher through internal mechanism and it wasn't working. Ok, time to reinstall again.

Update: Full reinstall helps (don't forget to save your chats archive). BTW Deepseek2-Base-236billions in quality Q2_0 uses ~87Gb RAM on CPU, hallucinating a lot, for coding preferred better quality (and more RAM). Reducing GPU layers helping a lot to launch (weird bug that if not enough VRAM on GPU it decline to start even with plenty free RAM).