vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.66k stars 3.91k forks source link

[Usage]: how to test the time of response about minicpm-v-2.6 served by VLLM #7891

Closed Mysnake closed 1 week ago

Mysnake commented 2 weeks ago

Your current environment

I depoly minicpmv though VLLM, but I want to test the the time of response. First, I use OpenAI.chat.completions.create() to access the server. It response the result successfully. Code in the following: image

And then, I use post request to access the server. Code is in the following: image

image

The log of server is image

How can I send the request of json to server?

🐛 Describe the bug

How can I send the request of json to server?

Before submitting a new issue...

DarkLight1337 commented 2 weeks ago

You should be using Chat Completions API (v1/chat/completions), not the Completions API (v1/completions) in your POST request.

ywang96 commented 1 week ago

Closing this since @DarkLight1337 provided a good answer