转化模型格式(.bin->.flm)时

ColorfulDick commented 7 months ago

在转化SUS-Chat-34B模型(该模型完全兼容llama架构)为flm格式时，报了这个错：

root@5ce5bafeea81:/app# python glm_trans_flm.py 
Loading checkpoint shards: 100%|██████████████████████████████████████████████████| 7/7 [01:09<00:00,  9.90s/it]
convert ( 543 / 543 )
Warmup...
FastLLM Error: Reshape error.

terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
Aborted (core dumped)

转化脚本如下，也是参考flm官方提供的：

from transformers import AutoTokenizer, AutoModel,AutoModelForCausalLM
import torch

model_path = "./SUS-Chat-34B"
tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path, device_map="auto", torch_dtype=torch.float16,trust_remote_code=True
).eval()

# 加入下面这两行，将huggingface模型转换成fastllm模型
# 目前from_hf接口只能接受原始模型，或者ChatGLM的int4, int8量化模型，暂时不能转换其它量化模型
from fastllm_pytools import llm
llm.set_device_map(["cuda:0", "cuda:1","cuda:2","cuda:3","cuda:4"])
model = llm.from_hf(model, tokenizer,dtype = "float16") # dtype支持 "float16", "int8", "int4"
model.save("./SUS-Chat-34B.flm")

该如何解决，cuda版本为12.2，同样的代码转chatglm3-6b和baichuan2都是没问题的

TylunasLi commented 7 months ago

我也在SUS-Chat-34B复现了问题，由于测试的环境没搞好，暂时没找到解决办法...

TylunasLi commented 7 months ago

测试了Yi-6B，发现是由于目前fastllm还未支持Grouped Query Attention导致的。正在修改中。

ztxz16 / fastllm

转化模型格式(.bin->.flm)时 #413