vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
28.77k stars 4.26k forks source link

how to use vicuna-v1.3 model in vllm #376

Closed lijianxing123 closed 1 year ago

lijianxing123 commented 1 year ago

Hi,I would like to ask how to set the parameters of the vicuna-1.3 model or can you give me a running example, my model text output is wrong. Thanks

lijianxing123 commented 1 year ago

Here is my code:

from vllm import LLM, SamplingParams
# Sample prompts.
prompts = [
    #"Hello, my name is",
    #"The president of the United States is",
    #"The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95,max_tokens=16)

# Create an LLM.
llm = LLM(model="lmsys/vicuna-7b-v1.3")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
lijianxing123 commented 1 year ago

the result is: Prompt: 'The future of AI is', Generated text: "bright, but it'<s> Here's What You Need to Know About Comey's Surprise Announcement\nFormer FBI Director James Comey is breaking his silence. After months of silence, Comey has agreed to testify before the Senate Intelligence Committee next week. Comey was fired by President Trump in May, and his departure has been a source of controversy ever since.</s>"

gesanqiu commented 1 year ago

The prompt of Vicuna should be like this:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {Your prompt here} ASSISTANT:

lijianxing123 commented 1 year ago

The prompt of Vicuna should be like this:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {Your prompt here} ASSISTANT:

Can you give me a example code,Thanks

lijianxing123 commented 1 year ago

@gesanqiu

gesanqiu commented 1 year ago

@lijianxing123 You can try Vicuna-7B-v1.3 by /v1/chat/completions in the vllm/vllm/entrypoints/openai/api_server.py, it used conversation template from FastChat, which will handle the prompt template. The request looks like following:

curl --location 'http://127.0.0.1:8012/v1/chat/completions' \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--data '{
    "model": "vicuna",
    "stream": true,
    "messages":[
        {
            "role": "user",
            "content": "The future of AI is"
        }
    ],
    "max_tokens": 512,
    "n": 2,
    "use_beam_search": true,
    "temperature": 0
}
'

The response is:

{
    "id": "cmpl-9371b650e43242b4b9557af6feeaef11",
    "object": "chat.completion",
    "created": 1688695383,
    "model": "vicuna_v1.1",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The future of AI is likely to involve continued advancements in the development and deployment of artificial intelligence technologies across a wide range of industries and applications. Some potential areas of focus for future AI research and development include:\n\n1. Improved natural language processing and understanding, which could enable more advanced chatbots, virtual assistants, and language translation tools.\n2. Increased automation and efficiency in industries such as manufacturing, logistics, and healthcare, through the use of robotics and autonomous systems.\n3. Continued advancements in computer vision and image recognition, which could enable more sophisticated security systems, self-driving cars, and personalized medicine.\n4. The development of more advanced machine learning algorithms and models, which could enable more accurate predictions and decision-making in a variety of fields.\n5. The continued integration of AI technologies into everyday life, through the use of smart home devices, wearable technology, and other connected devices.\n\nOverall, the future of AI is likely to be characterized by ongoing innovation and the development of new applications and technologies that can help to improve our lives and solve some of the world's most pressing problems.</s>"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 45,
        "total_tokens": 311,
        "completion_tokens": 266
    }
}

BTW, for your case, your prompt should be:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: The future of AI is ASSISTANT:

zhuohan123 commented 1 year ago

Huge thanks to @gesanqiu for addressing this issue! Feel free to reopen the issue if you have more questions.

schnurro-bits commented 1 year ago

curl --location 'http://127.0.0.1:8012/v1/chat/completions' \ --header 'accept: application/json' \ --header 'Content-Type: application/json' \ --data '{ "model": "vicuna", "stream": true, "messages":[ { "role": "user", "content": "The future of AI is" } ], "max_tokens": 512, "n": 2, "use_beam_search": true, "temperature": 0 }

@gesanqiu this gives me: "{"object":"error","message":"The model vicuna does not exist.","type":"invalid_request_error","param":null,"code":null}"

Docstring of /v1/models endpoint reads "Show available models. Right now we only have one model."