kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Apache License 2.0
741 stars 39 forks source link

Mixtral-8x7B-v0.1 GGUF file error #42

Closed RealLittleXian closed 3 months ago

RealLittleXian commented 3 months ago

Hello there, I run

$ python -m ktransformers.local_chat --model_path /share/models/Mixtral-8x7B-v0.1 --gguf_path /share/models/Mixtral-8x7b-q4k_m/mixtral-8x7b-q4k-medium.gguf

and got

...
Injecting model.layers.31 as default
Injecting model.layers.31.self_attn as default
Injecting model.layers.31.self_attn.q_proj as ktransformers.operators.linear . KTransformersLinear
Injecting model.layers.31.self_attn.k_proj as ktransformers.operators.linear . KTransformersLinear
Injecting model.layers.31.self_attn.v_proj as ktransformers.operators.linear . KTransformersLinear
Injecting model.layers.31.self_attn.o_proj as ktransformers.operators.linear . KTransformersLinear
Injecting model.layers.31.self_attn.rotary_emb as ktransformers.operators.RoPE . RotaryEmbedding
Injecting model.layers.31.block_sparse_moe as ktransformers.operators.experts . KMisrtalSparseMoEBlock
Injecting model.layers.31.block_sparse_moe.gate as ktransformers.operators.linear . KTransformersLinear
This linear module's in_features or out_features is not divisible by GPTQ_MARLIN_MIN_THREAD_N(64), using KLinearTorch instead.
module info: key:blk.31.ffn_gate_inp orig_module:Linear(in_features=4096, out_features=8, bias=False)
Injecting model.layers.31.block_sparse_moe.experts as ktransformers.operators.experts . KTransformersExperts
Injecting model.layers.31.input_layernorm as default
Injecting model.layers.31.post_attention_layernorm as default
Injecting model.norm as default
Injecting lm_head as default
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.10/site-packages/ktransformers/local_chat.py", line 120, in <module>
    fire.Fire(local_chat)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/ktransformers/local_chat.py", line 92, in local_chat
    optimize_and_load_gguf(model, optimize_rule_path, gguf_path, config)
  File "/opt/conda/lib/python3.10/site-packages/ktransformers/optimize/optimize.py", line 129, in optimize_and_load_gguf
    load_weights(module, gguf_loader)
  File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 83, in load_weights
    load_weights(child, gguf_loader, prefix+name+".")
  File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 83, in load_weights
    load_weights(child, gguf_loader, prefix+name+".")
  File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 81, in load_weights
    load_cur_state_dict(module, gguf_loader, prefix)
  File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 76, in load_cur_state_dict
    raise Exception(f"can't find {translated_key} in GGUF file!")
Exception: can't find token_embd.weight in GGUF file!

It seems like something wrong with my GGUF file. May I ask what is the link for the mixtral 8x7b GGUF you use?

Thanks.

BITcyman commented 3 months ago

Hi, it seems some error about your given gguf_path parameter. If you want to run local_chat, the gguf_path must be the directory where the gguf files exist. (In your example, the gguf_path should be given as "/share/models/Mixtral-8x7b-q4k_m/"). This design aims to read multiple split gguf files. You can change the given gguf_path and try it again. If it doesn't work as well, you can try this gguf. It is able to run on our machines. Good luck!

We will discuss about this design later and if it's possible, we will try to make gguf_path parameter more flexible.