question about vllm - Githubissues

zzllkk2003 commented 1 week ago

does spring ai support the vllm +qwen? which starter can i use? and how to use spring ai connect the vllm ? tks.

codespearhead commented 6 days ago

vLLM currently claims to have an OpenAI-compatible server, so you just have to use spring-ai-openai-spring-boot-starter and set the spring.ai.openai.base-url property [2] accordingly.

[1] https://docs.vllm.ai/en/stable/getting_started/quickstart.html#openai-compatible-server [2] https://docs.spring.io/spring-ai/reference/api/chat/openai-chat.html#_connection_properties

zzllkk2003 commented 2 days ago

vLLM currently claims to have an OpenAI-compatible server, so you just have to use spring-ai-openai-spring-boot-starter and set the spring.ai.openai.base-url property [2] accordingly.

[1] https://docs.vllm.ai/en/stable/getting_started/quickstart.html#openai-compatible-server [2] https://docs.spring.io/spring-ai/reference/api/chat/openai-chat.html#_connection_properties

It seems that a certain change has affected the structure of the request body,please see at MediaContent[1]. and using spring-ai-openai-spring-boot-starter 1.0.0.M1, it is impossible to request the vllm api

[1] https://github.com/spring-projects/spring-ai/commit/834d2d04879c080da208bdde7cda5aea7c48f585

1.0.0.M1- spring-ai-openai-spring-boot-starter, request

{
    "messages": [
        {
            "content": [
                {
                    "text": "Tell me a joke",
                    "type": "text"
                }
            ],
            "role": "user"
        }
    ],
    "model": "/data1/pretrained_models/Qwen/Qwen1.5-32B-Chat",
    "stream": false,
    "temperature": 0.7
}

--- vllm api request

{
    "max_tokens": 4000,
    "messages": [
        {
            "content": "Tell me a joke",
            "role": "user"
        }
    ],
    "model": "/data1/pretrained_models/Qwen/Qwen1.5-32B-Chat",
    "stream": false,
    "temperature": 0.7
}

so how can i use the spring-ai-openai-spring-boot-starter ?

spring-projects / spring-ai

question about vllm #998