Closed andrePankraz closed 2 months ago
Can you provide a link to the image that is causing the problem?
Thats just the officiel image, that the project is pushing to Docker Hub
Thats just the officiel image, that the project is pushing to Docker Hub
I'm referring to the image that you input to the model.
Oh sry, the word image is totally rewired now in my brain ;) I was using following image from your own tests in https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_vision.py:
curl 'https://ai1.dev.init/multimodal-llava/v1/chat/completions' -k -H 'Content-Type: application/json' -d @- <<EOF
{
"model": "llava-hf/llava-v1.6-mistral-7b-hf",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,$(curl -s https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg | base64 -w 0)"
}
},
{
"type": "text",
"text": "Was ist in dem Bild?"
}
]
}
],
"temperature": 0.2,
"top_p": 0.1,
"top_k": 20,
"frequency_penalty": 0.2
}
EOF
For that particular image, does the same problem happen if you use offline inference? We should have fixed that already...
Sry cannot invest more than that. if next version already fixes it, I'm fine with it. Just wanted to forward the observed issue. I just use VLLM docker setup on remote GPU server. I would need to much time now to create a different setup with local GPU or create some debugging Docker image (cannot install/run directly there).
I tried the same command using a local server:
vllm/entrypoints/openai/api_server.py --port 8001 --model llava-hf/llava-v1.6-mistral-7b-hf --enforce-eager
Request:
curl 'http://localhost:8001/v1/chat/completions' -k -H 'Content-Type: application/json' -d @- <<EOF
{
"model": "llava-hf/llava-v1.6-mistral-7b-hf",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,$(curl -s https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg | base64 -w 0)"
}
},
{
"type": "text",
"text": "Was ist in dem Bild?"
}
]
}
],
"temperature": 0.2,
"top_p": 0.1,
"top_k": 20,
"frequency_penalty": 0.2
}
EOF
Output:
{"id":"cmpl-05652f0b1fda4b1ba895beb78d6f412d","object":"chat.completion","created":1720336691,"model":"llava-hf/llava-v1.6-mistral-7b-hf","choices":[{"index":0,"message":{"role":"assistant","content":" Das Bild zeigt eine Landschaft mit einem Weg durch ein Grasfeld. Der Weg ist gepflastert und fĂŒhrt durch eine offene, grasige FlĂ€che, die von einigen BĂ€umen und StrĂ€uchern begrenzt wird. Im Hintergrund sind weitere GrasflĂ€chen und BĂ€ume zu sehen, die unter einem weiten Himmel stehen. Es ist ein schöner, natĂŒrlicher Landschaftsbild, das eine ruhige und ungestörte Umgebung darstellt. ","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":2161,"total_tokens":2283,"completion_tokens":122}}
Can you try disabling --enable-chunked-prefill
and/or running the model on a single GPU and see if it fixes the problem?
Edit: Actually, --max-num-batched-tokens=2048
might be causing the problem since the image takes up 2144 tokens, and the error message shows that only 2043 (=2048-5
where there are 5 text tokens) image tokens are available.
I have deactivated prefill/maxnumbatch and it works with that. thx.
config was a copy from some other bigger LLM.
Glad to help!
Yes thx. May be it deserves some warning/error in startup if it has to be ruled out. Also it's strange, that it's stuck in this faulty mode, even for other smaller images that worked. But I'm happy at least. :)
Your current environment
đ Describe the bug
Config:
It works for me via OpenAI Vision compatible API calls, e.g.:
But after some bigger image I get the following exception and after that, I have to restart vLLM - doesn't work again even for smaller images.