gitmylo commented 2 months ago

<|start_header_id|>system<|end_header_id|>

{System}<|eot_id|>
<|start_header_id|>user<|end_header_id|>

{User}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{Assistant}

The official Llama 3 template:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Most notable differences:

There is a newline after the system prompt's <|eot_id|> token which cannot be removed, this changes the structure of the template slightly.
Since all messages have roles indicated by the system prompt, it's not a traditional prompt template, but instead a per-role chat template.

In reality, the template should be applied per message, instead of on the whole chat. A single message being like this:

<|start_header_id|>{{ role }}<|end_header_id|>

{{ system_prompt }}<|eot_id|>

Where {{ role }} is the message's role (system, user, assistant. The api already uses those roles since it mimicks OpenAI).

Once all the messages are formatted in this template, add one more:

<|start_header_id|>assistant<|end_header_id|>

{{ assistant }}

Where {{ assistant }} is usually empty. As the LLM is meant to generate this part.

How can this be implemented?

I believe this could be implemented with a toggle within templates. Which switches between instruct style (alpaca, codellama instruct/Llama 2 chat, Phi 2, etc) and chat style (ChatML, Llama 3 chat, Google Gemma Instruct).

When using the instruct style, it can remain unchanged. When using the chat style, The prompt template could for example contain settings like:

Prefix - The prefix for the template, in case a model requires this.
- For Llama 3, this would be empty
Message pre role - The part before the message's role's name.
- For Llama 3, this would be <|start_header_id|>
Role name map - If a model doesn't use the default system, user, assistant, the appropriate alternatives can optionally be provided here
- For Llama 3, this would be empty, as it already uses the roles system, user, assistant
Message post role - The part after the message's role's name until the message's content.
- For Llama 3, this would be <|end_header_id|>\n\n
Message suffix - The part after the message's content. Added after each message, generated or predefined.
- For Llama 3, this would be <|eot_id|> (Note: no newlines)

Then, writing these things, assume this short chat (in OpenAI api format):

[
  {"role": "system", "content": "You are a helpful AI assistant."},
  {"role": "user", "content": "Hi!"},
  {"role": "assistant", "content": "Hello there! It's great to meet you!"},
  {"role": "user", "content": "What are some recipes for making chocolate chip cookies?"}
]

should become (assuming <|begin_of_text|> is included, as it is for all system prompts already):

<|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

Hi!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hello there! It's great to meet you!<|eot_id|><|start_header_id|>user<|end_header_id|>

What are some recipes for making chocolate chip cookies?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

This creates a correctly formatted prompt for Llama 3, with multiple messages and roles per message.

Feel free to ask questions if anything isn't clear enough.

fouadmok commented 2 months ago

hi

joetann commented 2 months ago

Hi @gitmylo, thanks so much for writing this. It's driving me a little bit mad!

I appreciate this may be a silly question but I'm a tad confused...

When sending chat completion requests, do I need to pass these special tokens into the API when I send the request, or do I need to just use the configs in LM Studio?

I.e. option 1:

    ...
      "messages": [ 
      { "role": "system", "content": "Always answer in rhymes." },
      { "role": "user", "content": "Introduce yourself." }
    ], 
    ...

Or option 2:

    ...
      "messages": [ 
      { "role": "system", "content": "<|start_header_id|>system<|end_header_id|>\n\nAlways answer in rhymes. <|eot_id|>" },
      { "role": "user", "content": "<|start_header_id|>user<|end_header_id|>\n\nIntroduce yourself. <|eot_id|><|start_header_id|>assistant<|end_header_id|>" }
    ], 
    ...

Or something else?!

If option 2, is it correct to place <|start_header_id|>assistant<|end_header_id|> at the end of the user message?

I'm not getting the sensible output I'd expect (and hear) so I'm definitely doing something wrong somewhere.

Thanks in advance.

hahmad2008 commented 1 month ago

@gitmylo What is the official system message for llama3 model ( NousResearch/Meta-Llama-3-8B-Instruct)?

yagil commented 1 month ago

@gitmylo sorry for missing this issue, thanks for sharing the details! We'll look and circle back

ryan-the-crayon commented 1 month ago

@gitmylo Thanks for the report. Really appreciate the effort into writing all the details. I believe the issues will be fixed in this PR:

https://github.com/lmstudio-ai/configs/pull/47/files

Formatting after the fix:

I have compared this with the documentation you linked very carefully. They should be exactly the same now, including all the newlines.

(The prompt formatting text box in the LM Studio is a bit misleading due to auto wrapping. To get the exact output, use lms log stream.)

yagil commented 1 month ago

The fix will be included in LM Studio 0.2.24

lmstudio-ai / .github

Suggested improvement: Llama 3 template (and other chat templates) differs slightly due to restricted options #43

How can this be implemented?