[BUG] Issue with microsoft/Phi-3.5-vision-instruct for specific image sizes

Describe the bug

Discussion here: https://github.com/vllm-project/vllm/pull/7916 vllm server with phi Attempted to assign 457 = 457 multimodal tokens to 757 placeholders

10:36 “url”: “https://mir-s3-cdn-cf.behance.net/project_modules/max_1200/73fbe271026179.5bb6e7af358b6.jpg”

[DEBUG][pid:1530] Command to be run: ['/local_disk0/.ephemeral_nfs/envs/pythonEnv-f5e9abdb-8a34-4c63-ad6d-5227ccc7b38f/bin/python', '-m', 'vllm.entrypoints.openai.api_server', '--host', '0.0.0.0', '--port', '9989', '--model', 'microsoft/Phi-3.5-vision-instruct', '--served-model-name', 'microsoft/Phi-3.5-vision-instruct', '--trust-remote-code', '--max-model-len', '12000', '--guided-decoding-backend', 'outlines'

vllm is version 0.5.5 pinned microsoft/Phi-3.5-vision-instruct

the server restarts fine but hangs.

stikkireddy / mlflow-extensions

[BUG] Issue with microsoft/Phi-3.5-vision-instruct for specific image sizes #24