Closed sunil448832 closed 3 months ago
Please set --max_model_len
in the CLI to a larger value such as 4096
, otherwise the image embeddings cannot fit in the input to the language model.
It's already done.
vllm serve microsoft/Phi-3-vision-128k-instruct \
--dtype bfloat16 \
--gpu-memory-utilization 0.9 \
--max-model-len 8000 \
--api-key token-caption1 \
--tensor_parallel_size 1\
--enable_prefix_caching \
--use-v2-block-manager \
--trust-remote-code\
--disable-sliding-window\
I think it's not getting "<I image_1 |>" placeholder when putting image_feature inside tokens.
@Isotr0py can you take a look into this?
It's now working. Removed "enable_prefix_caching"
calling using following function:
Getting same error for both prompts.