Closed pseudotensor closed 1 month ago
i.e. instead of this: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_chat.py#L138-L140
allow multiple images.
Idea is that many models trained for 1 image actually work well with multiple, and blocking usage inhibits exploration of what models are capable of.
E.g. would be good for microsoft/Phi-3-vision-128k-instruct
In HF transformers, Phi-3 handles multiple images just fine. I've used it just fine as well.
It's also an officially supported task from Microsoft:
https://github.com/microsoft/Phi-3CookBook/blob/main/md/03.Inference/Vision_Inference.md#3-comparison-of-multiple-images
None
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "Multiple 'image_url' input is currently not supported.", 'type': 'BadRequestError', 'param': None, 'code': 400}
This is on our roadmap in #4194. We will work on that after supporting dynamic image size and streamlining the configuration arguments.
🚀 The feature, motivation and pitch
i.e. instead of this: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_chat.py#L138-L140
allow multiple images.
Idea is that many models trained for 1 image actually work well with multiple, and blocking usage inhibits exploration of what models are capable of.
E.g. would be good for microsoft/Phi-3-vision-128k-instruct
In HF transformers, Phi-3 handles multiple images just fine. I've used it just fine as well.
It's also an officially supported task from Microsoft:
https://github.com/microsoft/Phi-3CookBook/blob/main/md/03.Inference/Vision_Inference.md#3-comparison-of-multiple-images
Alternatives
None
Additional context