vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.06k stars 4.54k forks source link

[Usage]: OpenAI API for Phi-3-vision-128k-instruct #7068

Closed sunil448832 closed 3 months ago

sunil448832 commented 3 months ago
BadRequestError: Error code: 400 - {'object': 'error', 'message': 'Attempted to assign 1 x 2509 = 2509 image tokens to 0 placeholders', 'type': 'BadRequestError', 'param': None, 'code': 400}

calling using following function:

def prepare_prompts(self, prompts, images):
        messages = []
        #re.sub(r"<\|.*?\|>", "", )
        for i in range(len(prompts)):
            if i % 2 == 0:
                content = [
                    {
                        "type": "text",
                        "text": prompts[i]
                    }
                ]
                if images[i]:
                    img_byte_arr = io.BytesIO()
                    images[i].save(img_byte_arr, format='PNG')
                    img_byte_arr = img_byte_arr.getvalue()
                    image_base64 = base64.b64encode(img_byte_arr).decode('utf-8')
                    content.append(
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": f"data:image/jpeg;base64,{image_base64}"
                            }
                        }
                    )
                messages.append({"role": "user", "content": content})
            else:
                messages.append({"role": "assistant", "content": prompts[i]})
        return messages
I tried two format for prompts[i].
1. "Describe this image"
2. "<|image_1|>\n Describe this image"

Getting same error for both prompts.

DarkLight1337 commented 3 months ago

Please set --max_model_len in the CLI to a larger value such as 4096, otherwise the image embeddings cannot fit in the input to the language model.

sunil448832 commented 3 months ago

It's already done.

vllm serve microsoft/Phi-3-vision-128k-instruct \
    --dtype bfloat16 \
    --gpu-memory-utilization 0.9 \
    --max-model-len 8000 \
    --api-key token-caption1 \
    --tensor_parallel_size 1\
    --enable_prefix_caching \
    --use-v2-block-manager \
    --trust-remote-code\
    --disable-sliding-window\

I think it's not getting "<I image_1 |>" placeholder when putting image_feature inside tokens.

DarkLight1337 commented 3 months ago

@Isotr0py can you take a look into this?

sunil448832 commented 3 months ago

It's now working. Removed "enable_prefix_caching"