vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.83k stars 4.11k forks source link

[Bug]: Error when running multimodal large models with --enable-prefix-caching #8296

Open jiyanxin opened 3 weeks ago

jiyanxin commented 3 weeks ago

Your current environment

The output of `python collect_env.py` ```text When I was using vllm to launch the Qwen2-VL model service, I configured the parameter --enable-prefix-caching. An error occurred when the service requested the second image. It seems that during the current use, this parameter is not compatible with multimodal large models. Do we have plans to fix this bug for compatibility in the future? ``` ![下载](https://github.com/user-attachments/assets/0b61ae2d-5e32-4895-be4b-a84749d94646)

🐛 Describe the bug

When I was using vllm to launch the Qwen2-VL model service, I configured the parameter --enable-prefix-caching. An error occurred when the service requested the second image. It seems that during the current use, this parameter is not compatible with multimodal large models. Do we have plans to fix this bug for compatibility in the future? 下载

Before submitting a new issue...

DarkLight1337 commented 3 weeks ago

Are you using the latest version of the PR branch for Qwen2-VL? This should be fixed by a recent PR #8028

Edit: Could you show how you're inputting the images?