[Bug]: 启动之后用了一段时间显存越占越多

lxb0425 commented 2 months ago

Your current environment

2*A100 配置启动项 python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 7864 --max-model-len 8000 --served-model-name chat-v2.0 --model /workspace/sdata/checkpoint-140-merged --enforce-eager --tensor-parallel-size 2 --gpu-memory-utilization 0.95

Model Input Dumps

🐛 Describe the bug

启动后使用一段时间显存越占越大最后会崩掉

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

DarkLight1337 commented 2 months ago

Does it keep increasing until OOM if you leave the server idle?

lxb0425 commented 2 months ago

闲置状态比较好只有一直调用后才会这样又增加了

DarkLight1337 commented 2 months ago

@youkaichao @robertgshaw2-neuralmagic any idea about this?

gongjl123 commented 3 hours ago

我也遇到了这个问题最后怎么解决的啊大哥 python -m vllm.entrypoints.openai.api_server --model /home/fitech/qianwen2.5/qianwen2.5-14b-int4/qianwen2.5-14b-int4 --trust-remote-code --served-model-name Qwen2.5-14B-Instruct-GPTQ-INT4 --gpu-memory-utilization 0.6 --max-model-len=2048 --port 8788

vllm-project / vllm