BAICHUAN2没有MakeInput的实现

ztxz16 / fastllm

纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行

Apache License 2.0

3.28k stars 332 forks source link

BAICHUAN2没有MakeInput的实现 #396

Closed yiguanxian closed 6 months ago

yiguanxian commented 8 months ago

BAICHUANl类模型没有这个接口：virtual std::string MakeInput(const std::string &history, int round, const std::string &input)，那是不是需要我自己去按官方的方式拼接构造prompt

TylunasLi commented 8 months ago

baichuan2模型使用fastllm::LlamaModel 类实现，该类实现了MakeInput()方法。
特殊token prompt使用"<FLM_FIXTOKEN{id}>"格式，在转换的时候存储在模型文件中。

因此，无需您自己实现MakeInput()方法。

yiguanxian commented 8 months ago

baichuan2有没有example，我感觉有些掉精度。不知道是不是我的用法问题，我是这么用的： import torch from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation.utils import GenerationConfig from fastllm_pytools import llm

modelpath = "baichuan-inc/Baichuan2-13B-Chat" tokenizer = AutoTokenizer.from_pretrained(modelpath, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(modelpath, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True) model.generation_config = GenerationConfig.from_pretrained(modelpath) model = llm.from_hf(model, tokenizer, dtype = "int8") model.stream_response("怎么创建用户")

yiguanxian commented 8 months ago

@TylunasLi 如上，baichuan2不知道是否需要按原始模型中的方式自行处理下输入，再去调用fastllm中的stream_response

TylunasLi commented 8 months ago

@TylunasLi 如上，baichuan2不知道是否需要按原始模型中的方式自行处理下输入，再去调用fastllm中的stream_response

排查了一下，目前fastllm中 hf_model.create() 方法和torch2flm.tofile()处理Baichuan 2 的逻辑不同，导致两行代码加速无效，正计划提交PR修复。

yiguanxian commented 8 months ago

@TylunasLi

如上述回答，那现在的版本应该用hf_model.create() 方法还是torch2flm.tofile()呢？ 2.另外，按我上述代码，fastllm是如何确定我的模型类型的呢？（如何确定是baichuan2还是chatglmd等）

wenzhaojie commented 7 months ago

batch pyfastllm推理baichuan-13b-chat，为何输出结果很差的呢？batch response: <Response [200]> text: "(1/4) prompt: 你好，请问你是谁？ response: (2/4) prompt: 今天天气怎么样？ response: ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! (3/4) prompt: How are you？ response: - 2019-0000-000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 (4/4) prompt: こんにちは response: 。私は田中です。これは私の新しいアパートです。私はここです。これは私の部屋です。これは私の机です。これは私の本です。これは私の音楽です。これは私のカフェーです。これは私の町です。これは "

TylunasLi commented 6 months ago

@wenzhaojie

看了下现有的batch_response示例，现在web_api实现的是generate方法，没有组装prompt，所以会给出非常奇怪的回答。