vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.11k stars 4.55k forks source link

[Usage]: vllm does not return content for vicuna #3765

Open yananchen1989 opened 7 months ago

yananchen1989 commented 7 months ago

Your current environment

hello, i follow your official documentation to use vllm. first is to start the server:

CUDA_VISIBLE_DEVICES=5 python -m vllm.entrypoints.openai.api_server \
            --model "lmsys/vicuna-7b-v1.5-16k" --host '0.0.0.0' --port 4789 --dtype float16 --api-key token-abc123

then call it

import openai,joblib
print(openai.__version__) # 1.14.2

from openai import OpenAI
client = OpenAI(
    base_url="http://36.111.143.5:4789/v1",
    api_key="token-abc123",
)

prompt = joblib.load('./invoke_prompt.pkl')
# print(prompt)

res = client.chat.completions.create(
  model="lmsys/vicuna-7b-v1.5-16k",
  temperature=0,
   max_tokens=1024,
  messages=[
    {"role": "system", "content": "You are a question answering assistant."},
    {"role": "user", "content": "What is the capital city of British Columbia, Canada"}
  ]
)

print(res)

however, i always get null content ChatCompletionMessage(content='', role='assistant', function_call=None, tool_calls=None) in the response.

may I know if i miss something important ?

thanks.

How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!