[Bug]: my vllm phi-3-vision server runs one request correctly then returns an error for the same request stating 2509 image tokens to 0 placeholders

SPZtaymed commented 2 months ago

🐛 Describe the bug

I deployed a basic vllm multi-modal server with phi-3-vision, that seems to run well at first, i.e my first request is correctly executed, but at the very moment i send a second request i get an internal server error 500, and this output ERROR:__main__:Error during generation: Attempted to assign 1 x 2509 = 2509 image tokens to 0 placeholders while sending the same request a second time. Capture d’écran 2024-08-23 à 00 50 33

Your current environment

PyTorch version: 2.3.1+cu121 OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Versions of relevant libraries:

[pip3] numpy==1.24.4 [pip3] nvidia-cublas-cu12==12.1.3.1 [pip3] nvidia-cuda-cupti-cu12==12.1.105 [pip3] nvidia-cuda-nvrtc-cu12==12.1.105 [pip3] nvidia-cuda-runtime-cu12==12.1.105 [pip3] nvidia-cudnn-cu12==8.9.2.26 [pip3] nvidia-cufft-cu12==11.0.2.54 [pip3] nvidia-curand-cu12==10.3.2.106 [pip3] nvidia-cusolver-cu12==11.4.5.107 [pip3] nvidia-cusparse-cu12==12.1.0.106 [pip3] nvidia-ml-py==12.555.43 [pip3] nvidia-nccl-cu12==2.20.5 [pip3] nvidia-nvjitlink-cu12==12.5.82 [pip3] nvidia-nvtx-cu12==12.1.105 [pip3] pyzmq==26.0.3 [pip3] torch==2.3.1 [pip3] torchvision==0.18.1 [pip3] transformers==4.43.2 [pip3] triton==2.3.1 [conda] Could not collect ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.5.3.post1@38c4b7e863570a045308af814c72f4504297222e vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X PHB 0-47 0 N/A GPU1 PHB X 0-47 0 N/A

DarkLight1337 commented 2 months ago

Please update your vLLM version as the release fixes a number of bugs relating to it.

hommayushi3 commented 2 months ago

I have experienced this before. IIRC, for me, this occurred only when I tried to use ENABLE_PREFIX_CACHING=true with Phi-3-vision. Try turning this flag to false.

SPZtaymed commented 2 months ago

Thank you for your replies, updating vllm and disabling ENABLE_PREFIX_CACHING, worked for me.

vllm-project / vllm