Open yitianlian opened 3 months ago
Which version of vLLM were you running this with?
Can you share a sample output if you add max_tokens
in here?
response = client.chat.completions.create(
model=model_name,
messages=input_text,
temperature=0,
top_p=1,
)
I wonder if it has something to do with the stop tokens.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
🐛 Describe the bug
I found that using vllm to launch the model led to no stop output( unlimited output). But when I use LM-deploy to launch the same checkpoint, its output is normal. I don't know why and I want to use vllm to inference. My launch script:
My python code
The script of using LM-deploy:
the json file of the insturct