[Usage]: vllm does not return content for vicuna

Your current environment

hello, i follow your official documentation to use vllm. first is to start the server:

CUDA_VISIBLE_DEVICES=5 python -m vllm.entrypoints.openai.api_server \
            --model "lmsys/vicuna-7b-v1.5-16k" --host '0.0.0.0' --port 4789 --dtype float16 --api-key token-abc123

then call it

import openai,joblib
print(openai.__version__) # 1.14.2

from openai import OpenAI
client = OpenAI(
    base_url="http://36.111.143.5:4789/v1",
    api_key="token-abc123",
)

prompt = joblib.load('./invoke_prompt.pkl')
# print(prompt)

res = client.chat.completions.create(
  model="lmsys/vicuna-7b-v1.5-16k",
  temperature=0,
   max_tokens=1024,
  messages=[
    {"role": "system", "content": "You are a question answering assistant."},
    {"role": "user", "content": "What is the capital city of British Columbia, Canada"}
  ]
)

print(res)

however, i always get null content ChatCompletionMessage(content='', role='assistant', function_call=None, tool_calls=None) in the response.

may I know if i miss something important ?

thanks.

How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.

vllm-project / vllm

[Usage]: vllm does not return content for vicuna #3765

Your current environment

How would you like to use vllm