Open SinanAkkoyun opened 3 weeks ago
This other issue is basically asking for the same thing: https://github.com/vllm-project/vllm/issues/9169
Allowing per-request mm_processor_kwargs
when running the server could also help rectify this specific issue, with the caveat that you could OOM the server if you're not careful with your settings - I was planning to open a PR to potentially allow that, but got sidetracked with other things. I will look into it again when I have time (likely in a few weeks)
Allowing per-request mm_processor_kwargs when running the server could also help rectify this specific issue
That would be even better, thanks! Would it be a lot of work to code a failsafe that returns 400 when the prompt exceeds max model length before the model decoding starts?
🚀 The feature, motivation and pitch
When starting a VLM with
--limit-mm-per-prompt
andmax_pixels
set, vLLM won't start if the model ctx length exceeds the mm limit * max_pixels token usage. However, this is too precautious and now does not enable me to have 10 small images that easily fit into context length when also wanting to support bigger images.Alternatives
Remove default limit-mm-per-prompt of 1 and only use existing logic when limit-mm-per-prompt is set
Additional context
Before submitting a new issue...