lmstudio-ai / .github

34 stars 3 forks source link

Suggested improvement: Llama 3 template (and other chat templates) differs slightly due to restricted options #43

Closed gitmylo closed 1 month ago

gitmylo commented 2 months ago

LM studio's Llama 3 template:

<|start_header_id|>system<|end_header_id|>

{System}<|eot_id|>
<|start_header_id|>user<|end_header_id|>

{User}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{Assistant}

The official Llama 3 template:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Most notable differences:

  1. There is a newline after the system prompt's <|eot_id|> token which cannot be removed, this changes the structure of the template slightly.
  2. Since all messages have roles indicated by the system prompt, it's not a traditional prompt template, but instead a per-role chat template.

In reality, the template should be applied per message, instead of on the whole chat. A single message being like this:

<|start_header_id|>{{ role }}<|end_header_id|>

{{ system_prompt }}<|eot_id|>

Where {{ role }} is the message's role (system, user, assistant. The api already uses those roles since it mimicks OpenAI).

Once all the messages are formatted in this template, add one more:

<|start_header_id|>assistant<|end_header_id|>

{{ assistant }}

Where {{ assistant }} is usually empty. As the LLM is meant to generate this part.

How can this be implemented?

I believe this could be implemented with a toggle within templates. Which switches between instruct style (alpaca, codellama instruct/Llama 2 chat, Phi 2, etc) and chat style (ChatML, Llama 3 chat, Google Gemma Instruct).

When using the instruct style, it can remain unchanged. When using the chat style, The prompt template could for example contain settings like:

Then, writing these things, assume this short chat (in OpenAI api format):

[
  {"role": "system", "content": "You are a helpful AI assistant."},
  {"role": "user", "content": "Hi!"},
  {"role": "assistant", "content": "Hello there! It's great to meet you!"},
  {"role": "user", "content": "What are some recipes for making chocolate chip cookies?"}
]

should become (assuming <|begin_of_text|> is included, as it is for all system prompts already):

<|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

Hi!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hello there! It's great to meet you!<|eot_id|><|start_header_id|>user<|end_header_id|>

What are some recipes for making chocolate chip cookies?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

This creates a correctly formatted prompt for Llama 3, with multiple messages and roles per message.

Feel free to ask questions if anything isn't clear enough.

fouadmok commented 2 months ago

hi

joetann commented 2 months ago

Hi @gitmylo, thanks so much for writing this. It's driving me a little bit mad!

I appreciate this may be a silly question but I'm a tad confused...

When sending chat completion requests, do I need to pass these special tokens into the API when I send the request, or do I need to just use the configs in LM Studio?

I.e. option 1:

    ...
      "messages": [ 
      { "role": "system", "content": "Always answer in rhymes." },
      { "role": "user", "content": "Introduce yourself." }
    ], 
    ...

Or option 2:

    ...
      "messages": [ 
      { "role": "system", "content": "<|start_header_id|>system<|end_header_id|>\n\nAlways answer in rhymes. <|eot_id|>" },
      { "role": "user", "content": "<|start_header_id|>user<|end_header_id|>\n\nIntroduce yourself. <|eot_id|><|start_header_id|>assistant<|end_header_id|>" }
    ], 
    ...

Or something else?!

If option 2, is it correct to place <|start_header_id|>assistant<|end_header_id|> at the end of the user message?

I'm not getting the sensible output I'd expect (and hear) so I'm definitely doing something wrong somewhere.

Thanks in advance.

hahmad2008 commented 1 month ago

@gitmylo What is the official system message for llama3 model ( NousResearch/Meta-Llama-3-8B-Instruct)?

yagil commented 1 month ago

@gitmylo sorry for missing this issue, thanks for sharing the details! We'll look and circle back

ryan-the-crayon commented 1 month ago

@gitmylo Thanks for the report. Really appreciate the effort into writing all the details. I believe the issues will be fixed in this PR:

https://github.com/lmstudio-ai/configs/pull/47/files

Formatting after the fix:

image

I have compared this with the documentation you linked very carefully. They should be exactly the same now, including all the newlines.

(The prompt formatting text box in the LM Studio is a bit misleading due to auto wrapping. To get the exact output, use lms log stream.)

yagil commented 1 month ago

The fix will be included in LM Studio 0.2.24