Closed lijianxing123 closed 1 year ago
Here is my code:
from vllm import LLM, SamplingParams
# Sample prompts.
prompts = [
#"Hello, my name is",
#"The president of the United States is",
#"The capital of France is",
"The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95,max_tokens=16)
# Create an LLM.
llm = LLM(model="lmsys/vicuna-7b-v1.3")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
the result is:
Prompt: 'The future of AI is', Generated text: "bright, but it'<s> Here's What You Need to Know About Comey's Surprise Announcement\nFormer FBI Director James Comey is breaking his silence. After months of silence, Comey has agreed to testify before the Senate Intelligence Committee next week. Comey was fired by President Trump in May, and his departure has been a source of controversy ever since.</s>"
The prompt of Vicuna should be like this:
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {Your prompt here} ASSISTANT:
The prompt of Vicuna should be like this:
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {Your prompt here} ASSISTANT:
Can you give me a example code,Thanks
@gesanqiu
@lijianxing123 You can try Vicuna-7B-v1.3 by /v1/chat/completions in the vllm/vllm/entrypoints/openai/api_server.py, it used conversation template from FastChat, which will handle the prompt template. The request looks like following:
curl --location 'http://127.0.0.1:8012/v1/chat/completions' \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--data '{
"model": "vicuna",
"stream": true,
"messages":[
{
"role": "user",
"content": "The future of AI is"
}
],
"max_tokens": 512,
"n": 2,
"use_beam_search": true,
"temperature": 0
}
'
The response is:
{
"id": "cmpl-9371b650e43242b4b9557af6feeaef11",
"object": "chat.completion",
"created": 1688695383,
"model": "vicuna_v1.1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The future of AI is likely to involve continued advancements in the development and deployment of artificial intelligence technologies across a wide range of industries and applications. Some potential areas of focus for future AI research and development include:\n\n1. Improved natural language processing and understanding, which could enable more advanced chatbots, virtual assistants, and language translation tools.\n2. Increased automation and efficiency in industries such as manufacturing, logistics, and healthcare, through the use of robotics and autonomous systems.\n3. Continued advancements in computer vision and image recognition, which could enable more sophisticated security systems, self-driving cars, and personalized medicine.\n4. The development of more advanced machine learning algorithms and models, which could enable more accurate predictions and decision-making in a variety of fields.\n5. The continued integration of AI technologies into everyday life, through the use of smart home devices, wearable technology, and other connected devices.\n\nOverall, the future of AI is likely to be characterized by ongoing innovation and the development of new applications and technologies that can help to improve our lives and solve some of the world's most pressing problems.</s>"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 45,
"total_tokens": 311,
"completion_tokens": 266
}
}
BTW, for your case, your prompt should be:
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: The future of AI is ASSISTANT:
Huge thanks to @gesanqiu for addressing this issue! Feel free to reopen the issue if you have more questions.
curl --location 'http://127.0.0.1:8012/v1/chat/completions' \ --header 'accept: application/json' \ --header 'Content-Type: application/json' \ --data '{ "model": "vicuna", "stream": true, "messages":[ { "role": "user", "content": "The future of AI is" } ], "max_tokens": 512, "n": 2, "use_beam_search": true, "temperature": 0 }
@gesanqiu this gives me: "{"object":"error","message":"The model vicuna
does not exist.","type":"invalid_request_error","param":null,"code":null}"
Docstring of /v1/models endpoint reads "Show available models. Right now we only have one model."
Hi,I would like to ask how to set the parameters of the vicuna-1.3 model or can you give me a running example, my model text output is wrong. Thanks