rendezqueue / rendezllama

CLI for llama.cpp with various commands to guide, edit, and regenerate tokens on the fly.
ISC License
10 stars 1 forks source link

doc(example/prompt/assistant_vicuna): Use Vicuna 1.1 format (USER/ASSISTANT) due to popularity #35

Closed grencez closed 8 months ago

grencez commented 11 months ago

Background

The assistant_vicuna example currently shows a format used by Vicuna 1.0 that that FastChat now calls "xgen". Rendezllama chat does not allow multiline output here, but I suspect that Vicuna and xgen do. The format looks like:

### Human: Hello!
### Assistant: How can I help you?
### Human: blah blah blah

The Vicuna 1.1 format is often cited by models, but the real Vicuna 1.1 format is a bit goofy. Omitting the system prompt, and using </s> to signify the EOS token, it looks like:

USER: Hello! ASSISTANT: How can I help you?</s>USER: blah blah blah

The Manticore format is a bit more intuitive because it separates USER and ASSISTANT with a newline instead of a space.

USER: Hello!
ASSISTANT: How can I help you?</s>USER: blah blah blah
ASSISTANT: blah blah blah</s>USER: blah blah blah

Problem

Tokenization. The Vicuna 1.1 and Manticore formats cause ASSISTANT: to be tokenized in different ways due to the difference in leading space vs leading newline. With the space, you can get A, SS, IST, ANT, : (LLaMA) or ASS, IST, ANT, : (Mistral). With a newline, both tokenizers give \n, ASS, IST, ANT, :.

Maybe Vicuna is fine with </s>\n at the end of the assistant's message.

grencez commented 8 months ago

In theory, this should be a good compromise:

USER: Hello! ASSISTANT: How can I help you?</s>
USER: blah blah blah ASSISTANT: blah blah</s>
USER: ...

But it's kind of finicky last time I tried. I'll test again though. If the Capybara models have problems, I'll continue tweaking the format until something works. I've had decent luck with Q: ... A: ...\n, so that's an option.

grencez commented 8 months ago

I ended up using more EOS tokens and no newlines. I put a space before ASSISTANT for consistent tokenization, and to kind of hint at the fact that it's a reply.

USER: Hello!</s> ASSISTANT: How can I help you?</s>USER: blah blah blah</s> ASSISTANT: blah blah</s>USER: ...