Open RylanSchaeffer opened 3 months ago
+1, 512 hangs, while 256 fine. increase swap_space to 16GB not help.
+1, 512 hangs, while 256 fine. increase swap_space to 16GB not help.
I later manager to get through it by further increase the swap_space to the 32.
llm = LLM(model=model_path, swap_space=32)
I guess the short of swap_space can be cause.
@RylanSchaeffer
My solution was to loop with smaller n. It works well enough for my purposes, but I wish a warning would be thrown. Hanging doesn't seem like an acceptable outcome, to me.
vllm is designed for fast inference and serving, and it is not designed for scaling test time compute :) I think you need to have an outer loop calling vllm, and scale the test time compute via scaling the outer loop.
@youkaichao
I don't know why your comment is relevant. I was trying to increase n for best-of-n for serving.
Even if your point is relevant, the process shouldn't hang indefinitely. A warning or error should be thrown.
Your current environment
🐛 Describe the bug
For a research project, I need to generate a large number of outputs per prompt. If I set
n
inSamplingParams()
to be higher than 256, the process hangs indefinitely.Code to reproduce:
The above script will successfully generate output for up to
n=256
, but then nothing will happen. VRAM will continue to be occupied, but GPU utilization will be 0:The process has been like this for 2 days. No error message has been output.