[Usage]: how to test the time of response about minicpm-v-2.6 served by VLLM

Mysnake commented 2 weeks ago

Your current environment

I depoly minicpmv though VLLM, but I want to test the the time of response. First, I use OpenAI.chat.completions.create() to access the server. It response the result successfully. Code in the following:

And then, I use post request to access the server. Code is in the following:

The log of server is

How can I send the request of json to server?

🐛 Describe the bug

How can I send the request of json to server?

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

DarkLight1337 commented 2 weeks ago

You should be using Chat Completions API (v1/chat/completions), not the Completions API (v1/completions) in your POST request.

ywang96 commented 1 week ago

Closing this since @DarkLight1337 provided a good answer

vllm-project / vllm