Closed grencez closed 8 months ago
In theory, this should be a good compromise:
USER: Hello! ASSISTANT: How can I help you?</s>
USER: blah blah blah ASSISTANT: blah blah</s>
USER: ...
But it's kind of finicky last time I tried. I'll test again though. If the Capybara models have problems, I'll continue tweaking the format until something works. I've had decent luck with Q: ... A: ...\n
, so that's an option.
I ended up using more EOS tokens and no newlines. I put a space before ASSISTANT
for consistent tokenization, and to kind of hint at the fact that it's a reply.
USER: Hello!</s> ASSISTANT: How can I help you?</s>USER: blah blah blah</s> ASSISTANT: blah blah</s>USER: ...
Background
The assistant_vicuna example currently shows a format used by Vicuna 1.0 that that FastChat now calls "xgen". Rendezllama chat does not allow multiline output here, but I suspect that Vicuna and xgen do. The format looks like:
The Vicuna 1.1 format is often cited by models, but the real Vicuna 1.1 format is a bit goofy. Omitting the system prompt, and using
</s>
to signify the EOS token, it looks like:The Manticore format is a bit more intuitive because it separates USER and ASSISTANT with a newline instead of a space.
Problem
Tokenization. The Vicuna 1.1 and Manticore formats cause
ASSISTANT:
to be tokenized in different ways due to the difference in leading space vs leading newline. With the space, you can getA
,SS
,IST
,ANT
,:
(LLaMA) orASS
,IST
,ANT
,:
(Mistral). With a newline, both tokenizers give\n
,ASS
,IST
,ANT
,:
.Maybe Vicuna is fine with
</s>\n
at the end of the assistant's message.