ztxz16 / fastllm

纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
Apache License 2.0
3.28k stars 332 forks source link

flm的tokenizer和原始tokenizer分词结果不一致 #397

Open yiguanxian opened 8 months ago

yiguanxian commented 8 months ago

chatglm2和baichuan2都有这个问题。

  1. 模型编译方式 from fastllm_pytools import llm from transformers import AutoTokenizer, AutoModel

hf_model = "/workspace/chatglm2-6B"

flm_dtype = "int8" model_name = hf_model.split("/")[-1] flm_model = f"/workspace/models/{model_name}-fastllm-{flm_dtype}.flm"

tokenizer = AutoTokenizer.from_pretrained(hf_model, trust_remote_code=True) model = AutoModel.from_pretrained(hf_model, trust_remote_code=True).half().cuda() model = llm.from_hf(model, tokenizer, dtype=flm_dtype) model.save(flm_model)

  1. 测试代码 prompt_input = "[Round 1]"

    from transformers import AutoTokenizer model_path = "/workspace/chatglm2-6B" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) print(f"src prompt: {prompt_input}, token id: {tokenizer.encode(prompt_input)}") # R ound

    import fastllm model_path = "/workspace/models/chatglm2-6B-fastllm-int8.flm" model = fastllm.create_llm(model_path) input_ids = model.weight.tokenizer.encode(prompt_input) input_ids = input_ids.to_list() input_ids = [int(v) for v in input_ids] print(f"fastllm prompt: {prompt_input}, token id: {input_ids}") # Ro und 3.测试结果 原始的会将Round这个单词分成"R"和"ound",而flm会将它分成 "Ro"和"und"。另外在百川2上输入"你是可爱",原始的会将其分成"你是"和"可爱",而flm转出来的baichuan2会将其分成" 你","是可","爱”

TylunasLi commented 8 months ago

chatglm3的问题是 model.save()没保存SentencePiece token权重导致的,使用torch2flm.toFile()时无此问题。已作了修复。