Closed faileon closed 3 months ago
@youkaichao FYI, I think shm_broadcast
is a bit tricky when it comes to multi-modal inputs...
@faileon I talked to Kaichao offline and in fact this won't be an issue if you build the docker image from the main branch and serve the model from that one. This is because of the recently merged https://github.com/vllm-project/vllm/pull/6183.
Feel free to raise another issue if you still see any other error from the latest main branch!
Closing this one as it's fixed in #6183
Your current environment
I have the following docker compose service running vLLM and llava-hf/llava-v1.6-mistral-7b-hf
I have a service sending 5 parallel requests on the exposed
/v1/chat/completions
, which will seize it with the following error:After which the container is stuck in a state with 5 requests, where it doesnt accept any new requests:
I must completely tear down the container and start it again to unstuck it. If I adjust my service to be more gentle - sending just 1 request at a time, it seems to hold steady.
This is an example request that I am sending:
A bit more from the stack trace: