kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Apache License 2.0
741 stars 39 forks source link

How can i run internlm2_5-7b-chat-1m in ktransformers? #74

Closed Ma1oneZhang closed 2 months ago

Ma1oneZhang commented 2 months ago

How can i run internlm2_5-7b-chat-1m in ktransformers?

I have downloaded the internlm2_5-7b-chat-1m model file and converted it to a gguf file. But ktransformer still reports an error like that:

load_weights(module, gguf_loader) File "/somewhere/ktransformers/ktransformers/util/utils.py", line 83, in load_weights load_weights(child, gguf_loader, prefix+name+".") File "/somewhere/ktransformers/ktransformers/util/utils.py", line 83, in load_weights load_weights(child, gguf_loader, prefix+name+".") File "/somewhere/ktransformers/ktransformers/util/utils.py", line 81, in load_weights load_cur_state_dict(module, gguf_loader, prefix) File "/somewhere/ktransformers/ktransformers/util/utils.py", line 76, in load_cur_state_dict raise Exception(f"can't find {translated_key} in GGUF file!") Exception: can't find model.tok_embeddings.weight in GGUF file!

ovowei commented 2 months ago

You need to first use the tools provided by InternLM to convert the internlm2_5-7b-chat-1m model into the LLaMA format. After that, you can convert it to a GGUF file. When using this method to download the model, please note that the model configuration needs to be in the LLaMA format, and the tokenizer should use the InternLM version.

As the tutorial here mentions, we have uploaded the model configuration, GGUF file, and tokenizer to a Hugging Face repository. We recommend downloading the model directly from this link to avoid the cumbersome process of merging the related files.

TKONIY commented 2 months ago

You need to first use the tools provided by InternLM to convert the internlm2_5-7b-chat-1m model into the LLaMA format. After that, you can convert it to a GGUF file. When using this method to download the model, please note that the model configuration needs to be in the LLaMA format, and the tokenizer should use the InternLM version.

As the tutorial here mentions, we have uploaded the model configuration, GGUF file, and tokenizer to a Hugging Face repository. We recommend downloading the model directly from this link to avoid the cumbersome process of merging the related files.

Do you plan to support loading the files from huggingface like AutoConfig in transformers?

ovowei commented 2 months ago

It's likely coming at some point, but not anytime soon.

TKONIY commented 2 months ago

Thank you!