Open PanJason opened 2 weeks ago
Did you follow the chat template of Yi-1.5-6B-Chat? I think it uses a different one from the one of Llama.
"bos_token": "<|startoftext|>",
"eos_token": "<|im_end|>"
"chat_template": "
{% if messages[0]['role'] == 'system' %}
{% set system_message = messages[0]['content'] %}
{% endif %}
{% if system_message is defined %}
{{ system_message }}
{% endif %}
{% for message in messages %}
{% set content = message['content'] %}
{% if message['role'] == 'user' %}
{{ '<|im_start|>user\\n' + content + '<|im_end|>\\n<|im_start|>assistant\\n' }}
{% elif message['role'] == 'assistant' %}
{{ content + '<|im_end|>' + '\\n' }}
{% endif %}
{% endfor %}"
The description of the bug:
I am using AWS P3 instances with 4 V100 GPUs, and the system configuration is in the section below. I ran the example from the readme. In one tmux window, I execute:
In another tmux window, I execute:
with the change of the correct port. However I got the following output:
I tried the same llama model with vllm and it gave me reasonable answers.
I also try another different model
01-ai/Yi-1.5-6B-Chat
from huggingface but I got random results either:I am uncertain what is going wrong. Currently I am trying to change the tokenizer and also use A100 to see whether the problem persists or not. Any suggestions on what can cause the problem is very welcome. Thanks!
System configuration
I collect this using the
collect_env.py
script from vllm: