ztxz16 / fastllm

纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
Apache License 2.0
3.29k stars 333 forks source link

moss模型成功转化后,无法进行推理。 #140

Open liushiton opened 1 year ago

liushiton commented 1 year ago

已经尝试过chatglm和chatglm2均成功加速,但是在尝试moss模型时(moss-moon-003-sft-plugin-int4),成功转化模型,但无法推理。同时stopping_criteria_list也不知道应该放哪里。 代码如下

tokenizer = AutoTokenizer.from_pretrained("../pre_model/moss-moon-003-sft-plugin-int4", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("../pre_model/moss-moon-003-sft-plugin-int4",
                                                 trust_remote_code=True).half().cuda()
stopping_criteria_list = StoppingCriteriaList(
        [StopWordsCriteria(tokenizer.encode("<eoc>", add_special_tokens=False))])

from fastllm_pytools import llm

model = llm.from_hf(model, tokenizer, dtype="float16")  # dtype支持 "float16", "int8", "int4"
model.save("fastllm_models/moss_model_2.flm")
answer_time = []
answers = []
response = model.response('你好')

报错信息如下

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
convert ( 719 / 719 )
output (719 / 719)
FastLLM Error: Linear's weight's shape's size should be 2.

terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
Aborted
ztxz16 commented 1 year ago

暂时只支持转换原始模型,不能直接转int4的模型

liushiton commented 1 year ago

暂时只支持转换原始模型,不能直接转int4的模型

好的谢谢,之后我试试

liushiton commented 1 year ago

我之前也尝试过使用moss原始模型(moss-moon-003-sft-plugin),进行多卡部署(因为单卡放不下),然后再进行转化模型和推理,也未能成功,也不支持转化多卡部署的模型是么。

liushiton commented 1 year ago

image 以及不知道转化moss模型后,停用词列表变量(stopping_criteria_list)如何作为模型推理时的参数。

ZiboZ commented 1 year ago

暂时只支持转换原始模型,不能直接转int4的模型

原始模型也不行呀,fastLLM error:unsupported

2496289471 commented 1 year ago

请问问题解决了吗

liushiton commented 1 year ago

问题未能解决,我理解moss模型,-plugin的版本恐怕都未适配。但是我需要-plugin版本的插件功能。所以我就没再尝试加速moss模型其他版本。