Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
提交前必须检查以下项目 | The following items must be checked before submission
[X] 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
[X] 我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案 | I have searched the existing issues / discussions
问题类型 | Type of problem
效果问题 | Effectiveness issues
操作系统 | Operating system
Linux
详细描述问题 | Detailed description of the problem
Internlm2-chat-7b无法生成正确的回复,使用的是2024.02.04版本
配置1
PORT=8945
# model related
MODEL_NAME=internlm2
MODEL_PATH=/mnt/LLM/public/LLM/internlm2-chat-7b
EMBEDDING_NAME=
ADAPTER_MODEL_PATH=
QUANTIZE=16
CONTEXT_LEN=
LOAD_IN_8BIT=false
LOAD_IN_4BIT=false
USING_PTUNING_V2=false
STREAM_INTERVERL=2
PROMPT_NAME=
# device related
DEVICE=
# "auto", "cuda:0", "cuda:1", ...
DEVICE_MAP=auto
GPUS=
NUM_GPUs=1
DTYPE=half
# api related
API_PREFIX=/v1
USE_STREAMER_V2=false
ENGINE=default
配置2:增加PROMPT_NAME字段
PORT=8945
# model related
MODEL_NAME=internlm2
MODEL_PATH=/mnt/LLM/public/LLM/internlm2-chat-7b
EMBEDDING_NAME=
ADAPTER_MODEL_PATH=
QUANTIZE=16
CONTEXT_LEN=
LOAD_IN_8BIT=false
LOAD_IN_4BIT=false
USING_PTUNING_V2=false
STREAM_INTERVERL=2
PROMPT_NAME=internlm2
# device related
DEVICE=
# "auto", "cuda:0", "cuda:1", ...
DEVICE_MAP=auto
GPUS=
NUM_GPUs=1
DTYPE=half
# api related
API_PREFIX=/v1
USE_STREAMER_V2=false
ENGINE=default
提交前必须检查以下项目 | The following items must be checked before submission
问题类型 | Type of problem
效果问题 | Effectiveness issues
操作系统 | Operating system
Linux
详细描述问题 | Detailed description of the problem
Internlm2-chat-7b无法生成正确的回复,使用的是2024.02.04版本
配置1
配置2:增加PROMPT_NAME字段
Dependencies
运行日志或截图 | Runtime logs or screenshots
配置1
配置2