Is there any accuracy loss when converting to flm model?

ztxz16 / fastllm

纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行

Apache License 2.0

3.31k stars 339 forks source link

### Instruction: The president of the United States is ### Response: The president of the United States The president of the United States is ### Instruction: The president of the United States is ### Response: The president of the United States is ### Instruction: The president of the United States is ### Response: The president of the United States is ### Instruction: The president of the United States is ### Response: The president of the United States is The president of the United States is ### Instruction: The president of the United States is ### Response: The president of the United States is

you should pay atention to difference of the prompt building process between original python code and fastlllm code. for llama2-7b:

python code is Llama.text_completion(),
in fastllm, there's no directly callable method, but you can wrap a method to call fastllm_lib.launch_response_str_llm_model(). or you can config the following parts in config.json and re-convert the model.
```
"pre_prompt": "",
"user_role": "",
"bot_role": "",
"history_sep":  "",
```

for llama2-7b-chat-hf,

python code is Llama.chat_completion()
in fastllm, you should config the following parts in config.json and re-convert the model.
```
"pre_prompt": "",
"user_role": "[INST] ",
"bot_role": " [/INST]",
"history_sep":  " ",
```

ztxz16 / fastllm

Is there any accuracy loss when converting to flm model? #305