RuntimeError: CUDA error: device-side assert triggered

提交前必须检查以下项目 | The following items must be checked before submission

[X] 请确保使用的是仓库最新代码（git pull），一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
[X] 我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案 | I have searched the existing issues / discussions

问题类型 | Type of problem

模型推理和部署 | Model inference and deployment

操作系统 | Operating system

Windows

详细描述问题 | Detailed description of the problem

The .env file is below

PORT=8000

# model related
MODEL_NAME=chatglm3
MODEL_PATH=D:\Projects\LLMModels\chatglm3-6b
CONTEXT_LEN=
LOAD_IN_8BIT=false
LOAD_IN_4BIT=false
PROMPT_NAME=chatglm3

# rag related
EMBEDDING_NAME=
RERANK_NAME=

# device related
# "auto", "cuda:0", "cuda:1", ...
DEVICE_MAP=cuda:0
GPUS=
NUM_GPUs=1
DTYPE=half

# api related
API_PREFIX=/v1

USE_STREAMER_V2=false
ENGINE=default

TASKS=llm
# TASKS=llm,rag

When tried to chat with chatclm3, I got " RuntimeError: CUDA error: device-side assert triggered"

Dependencies

No response

运行日志或截图 | Runtime logs or screenshots

No response

xusenlinzy / api-for-open-llm

RuntimeError: CUDA error: device-side assert triggered #302

提交前必须检查以下项目 | The following items must be checked before submission

问题类型 | Type of problem

操作系统 | Operating system

详细描述问题 | Detailed description of the problem

Dependencies

运行日志或截图 | Runtime logs or screenshots