Closed nguyenhoanganh2002 closed 5 days ago
I deployed LLM by docker-compose
--served-model-name ${LLM_MODEL_NAME} --model /root/.cache/huggingface/hub/qwen2-vien-ed --dtype bfloat16 --host 0.0.0.0 --port ${LLM_PORT} --api-key ${LLM_API_KEY} --max-model-len 4096 --gpu-memory-utilization 0.8
You need to use extra_body
when specifying extra parameters that vLLM sprinkled on top of the OpenAI API.
completion = client.chat.completions.create(
model="Qwen2-7B-Instruct",
messages=messages,
max_tokens=256,
top_p=0.8,
extra_body={'use_beam_search': True}
)
You need to use
extra_body
when specifying extra parameters that vLLM sprinkled on top of the OpenAI API.completion = client.chat.completions.create( model="Qwen2-7B-Instruct", messages=messages, max_tokens=256, top_p=0.8, extra_body={'use_beam_search': True} )
Thanks a lot.
Your current environment
How would you like to use vllm
How to use beam search when request OpenAI Completions API
I tried:
Got error: