Closed SPZtaymed closed 2 months ago
Please update your vLLM version as the release fixes a number of bugs relating to it.
I have experienced this before. IIRC, for me, this occurred only when I tried to use ENABLE_PREFIX_CACHING=true with Phi-3-vision. Try turning this flag to false.
Thank you for your replies, updating vllm and disabling ENABLE_PREFIX_CACHING
, worked for me.
🐛 Describe the bug
I deployed a basic vllm multi-modal server with
phi-3-vision
, that seems to run well at first, i.e my first request is correctly executed, but at the very moment i send a second request i get an internal server error 500, and this outputERROR:__main__:Error during generation: Attempted to assign 1 x 2509 = 2509 image tokens to 0 placeholders
while sending the same request a second time.Your current environment
PyTorch version: 2.3.1+cu121 OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Versions of relevant libraries:
[pip3] numpy==1.24.4 [pip3] nvidia-cublas-cu12==12.1.3.1 [pip3] nvidia-cuda-cupti-cu12==12.1.105 [pip3] nvidia-cuda-nvrtc-cu12==12.1.105 [pip3] nvidia-cuda-runtime-cu12==12.1.105 [pip3] nvidia-cudnn-cu12==8.9.2.26 [pip3] nvidia-cufft-cu12==11.0.2.54 [pip3] nvidia-curand-cu12==10.3.2.106 [pip3] nvidia-cusolver-cu12==11.4.5.107 [pip3] nvidia-cusparse-cu12==12.1.0.106 [pip3] nvidia-ml-py==12.555.43 [pip3] nvidia-nccl-cu12==2.20.5 [pip3] nvidia-nvjitlink-cu12==12.5.82 [pip3] nvidia-nvtx-cu12==12.1.105 [pip3] pyzmq==26.0.3 [pip3] torch==2.3.1 [pip3] torchvision==0.18.1 [pip3] transformers==4.43.2 [pip3] triton==2.3.1 [conda] Could not collect ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.5.3.post1@38c4b7e863570a045308af814c72f4504297222e vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X PHB 0-47 0 N/A GPU1 PHB X 0-47 0 N/A