kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Apache License 2.0
727 stars 37 forks source link

Error loading model: token_embd.weight not found in GGUF file #102

Open antonovkz opened 2 weeks ago

antonovkz commented 2 weeks ago

I am trying to run the local_chat.py script using the deepseek-ai/DeepSeek-V2.5 model and GGUF file, but I encounter an error during the model weight loading process.

Command that triggers the error:

python -m ktransformers.local_chat --model_path deepseek-ai/DeepSeek-V2.5 --gguf_path DeepSeek-V2.5/DeepSeek-V2.5.gguf

Error logs:

using custom modeling_xxx.py.
using default_optimize_rule for DeepseekV2ForCausalLM
Injecting model as ktransformers.operators.models . KDeepseekV2Model
Injecting model.embed_tokens as default
Injecting model.layers as default
Injecting model.layers.0 as default
Injecting model.layers.0.self_attn as ktransformers.operators.attention . KDeepseekV2Attention
Injecting model.layers.0.self_attn.q_a_proj as ktransformers.operators.linear . KTransformersLinear

...

Injecting model.layers.59.mlp as ktransformers.operators.experts . KDeepseekV2MoE
Injecting model.layers.59.mlp.experts as ktransformers.operators.experts . KTransformersExperts
Injecting model.layers.59.mlp.gate as default
Injecting model.layers.59.mlp.shared_experts as default
Injecting model.layers.59.mlp.shared_experts.gate_proj as ktransformers.operators.linear . KTransformersLinear
Injecting model.layers.59.mlp.shared_experts.up_proj as ktransformers.operators.linear . KTransformersLinear
Injecting model.layers.59.mlp.shared_experts.down_proj as ktransformers.operators.linear . KTransformersLinear
Injecting model.layers.59.mlp.shared_experts.act_fn as default
Injecting model.layers.59.input_layernorm as default
Injecting model.layers.59.post_attention_layernorm as default
Injecting model.norm as default
Injecting lm_head as default
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "F:\Models\ktransformers\venv\Lib\site-packages\ktransformers\local_chat.py", line 159, in <module>
    fire.Fire(local_chat)
  File "F:\Models\ktransformers\venv\Lib\site-packages\fire\core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\Models\ktransformers\venv\Lib\site-packages\fire\core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "F:\Models\ktransformers\venv\Lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "F:\Models\ktransformers\venv\Lib\site-packages\ktransformers\local_chat.py", line 106, in local_chat
    optimize_and_load_gguf(model, optimize_rule_path, gguf_path, config)
  File "F:\Models\ktransformers\venv\Lib\site-packages\ktransformers\optimize\optimize.py", line 129, in optimize_and_load_gguf
    load_weights(module, gguf_loader)
  File "F:\Models\ktransformers\venv\Lib\site-packages\ktransformers\util\utils.py", line 83, in load_weights
    load_weights(child, gguf_loader, prefix+name+".")
  File "F:\Models\ktransformers\venv\Lib\site-packages\ktransformers\util\utils.py", line 85, in load_weights
    module.load()
  File "F:\Models\ktransformers\venv\Lib\site-packages\ktransformers\operators\base_operator.py", line 60, in load
    utils.load_weights(child, self.gguf_loader, self.key+".")
  File "F:\Models\ktransformers\venv\Lib\site-packages\ktransformers\util\utils.py", line 83, in load_weights
    load_weights(child, gguf_loader, prefix+name+".")
  File "F:\Models\ktransformers\venv\Lib\site-packages\ktransformers\util\utils.py", line 81, in load_weights
    load_cur_state_dict(module, gguf_loader, prefix)
  File "F:\Models\ktransformers\venv\Lib\site-packages\ktransformers\util\utils.py", line 76, in load_cur_state_dict
    raise Exception(f"can't find {translated_key} in GGUF file!")
Exception: can't find token_embd.weight in GGUF file!

Steps to reproduce:

Installed the ktransformers package. Ran the command with the deepseek-ai/DeepSeek-V2.5 model and Q4_K_M GGUF file. Encountered the error mentioned above.

Expected behavior: The model should load successfully without errors.

Versions:

ktransformers last version Python version: 3.11 Operating System: Windows 11

Additional information: The GGUF file was downloaded from the https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF. The issue might be due to missing weights or incompatibility between the model and the GGUF format.

qiyuxinlin commented 2 days ago

79

Specify a folder when using the --gguf_path parameter.