Open lxb0425 opened 2 months ago
Does it keep increasing until OOM if you leave the server idle?
闲置状态比较好 只有一直调用后 才会这样 又增加了
@youkaichao @robertgshaw2-neuralmagic any idea about this?
我也遇到了这个问题 最后怎么解决的啊 大哥 python -m vllm.entrypoints.openai.api_server --model /home/fitech/qianwen2.5/qianwen2.5-14b-int4/qianwen2.5-14b-int4 --trust-remote-code --served-model-name Qwen2.5-14B-Instruct-GPTQ-INT4 --gpu-memory-utilization 0.6 --max-model-len=2048 --port 8788
Your current environment
2*A100 配置 启动项 python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 7864 --max-model-len 8000 --served-model-name chat-v2.0 --model /workspace/sdata/checkpoint-140-merged --enforce-eager --tensor-parallel-size 2 --gpu-memory-utilization 0.95
Model Input Dumps
🐛 Describe the bug
启动后 使用一段时间 显存越占越大 最后会崩掉
Before submitting a new issue...