Closed BabyChouSr closed 4 months ago
Thanks for reporting this! Can you check whether #6430 fixes this issue?
Not related to this PR in particular, but since you're serving this from the OpenAI API server, I don't think PaliGemma is supposed to work out-of-box with it because it was never instruction fine-tuned.
In the PaliGemma paper, it says
Gemma [79] is a family of auto-regressive decoder-only open large language models built
from the same research and technology used to create the Gemini [7] models. The models come
in different sizes (2B, 7B), both pretrained and instruction fine-tuned. PaliGemma uses the 2B
pretrained version.
@DarkLight1337 Thank you for taking on this issue! Sorry, but this still doesn't work for me. I pulled your branch using git fetch origin pull/6430/head
but i still run into the same error with the same input.
@ywang96 You bring up a good point! I'll have to familiarize myself with the paper, thanks for sharing.
Oops, I forgot to update the async version of fetch_image
. Can you try again?
thank you! works now :)
Hi @BabyChouSr I tried the below curl command on the paligemma model that we have hosted
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "google/paligemma-3b-mix-448",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://placehold.co/600x400/jpg"
}
}
]
}
],
"max_tokens": 300
}'
I am getting the following output, not the one you mentioned
<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_start>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_participation>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_particip>assistant\n<|im_participation>assistant"
What might be the issue here, can you please help me!
You should use a custom chat template so that the input has the same format as the one shown on HuggingFace.
@DarkLight1337 I hope the request body for the paligemma api is same for all when hosted through vLLM. Why we should be using custom chat template. Can you please elaborate much on this?
From my understanding, PaliGemma isn't designed as a chat model so it doesn't have a built in chat template. In this case you are required to define your own template since there isn't a default chat template that works for all models.
@DarkLight1337 To give more context, I tried the above curl command on the paligemma model that we have hosted through vLLM framework as same as what @BabyChouSr used for his query. But our output was completely different from he has told. So, I had asked a help for that.
How are you hosting the model? Please show the command that you used.
@DarkLight1337 To give more context, I tried the above curl command on the paligemma model that we have hosted through vLLM framework as same as what @BabyChouSr used for his query. But our output was completely different from he has told. So, I had asked a help for that.
I don't think by default the temperature is set to 0 (i.e, we're not greedily sampling) and that's probably why you're seeing the difference.
I would also encourage you to take a look at our example script examples/offline_inference_vision_language.py
.
How are you hosting the model? Please show the command that you used.
@DarkLight1337 It is through a cloud platform called Jarvislabs.ai, they have a vLLM option to host open source models through hugging face. When I tried with paligemma, it gave us two apis, one is /v1/chat/completions and /v1/completions. I thought /v1/chat/completions would work for us and tried it, but didn't proper response. The simple goal here is to given an image and a prompt. It should be able to give the output.
Do you have the ability to pass through command-line arguments? As mentioned above:
From my understanding, PaliGemma isn't designed as a chat model so it doesn't have a built in chat template. In this case you are required to define your own template since there isn't a default chat template that works for all models.
Do you have the ability to pass through command-line arguments? As mentioned above:
From my understanding, PaliGemma isn't designed as a chat model so it doesn't have a built in chat template. In this case you are required to define your own template since there isn't a default chat template that works for all models.
No, I have control only on the request body given for the API call.
How about selecting the HuggingFace model to use? Maybe you can fork the model repo and add the chat template to it.
Not sure. But, my doubt is why I am not able to get a proper output as like @BabyChouSr got for his jpg image query using /v1/chat/completions api call with paligemma model?
@JanuRam I don't think that the model should be used for chat responses. You will not receive content that is very meaningful. Try by using the llava template. However, I would say that chat is probably not the use case that you would want to use this model for. If you are looking for chat, you should try https://huggingface.co/openbmb/MiniCPM-V-2_6
python -m vllm.entrypoints.openai.api_server \
--model google/paligemma-3b-mix-224 \
--chat-template template_llava.jinja
It is not for chat (conversational purpose), mainly for visual question answering to be precise.
Your current environment
🐛 Describe the bug
PNG files don't seem to work for
paligemma-3b-mix-448
.To test, try the following command:
python -m vllm.entrypoints.openai.api_server --model google/paligemma-3b-mix-448 --chat-template examples/template_llava.jinja"
on the server.Then, test this command using:
Error Traceback Output:
However, if we test using a jpg image:
Output:
I believe that the reason why this is the case is because SigLip has a default
num_channels
parameter that is set to 3. When we take in PNG images, PNG images can have 4 channels (RGBA), which can lead to this mismatch. I discovered this mismatch when I was trying to load images usingImage.open(image_url).convert('RGBA')
and then realized that passing these images into vllm would not work due to the above error.