Closed gitmylo closed 1 month ago
hi
Hi @gitmylo, thanks so much for writing this. It's driving me a little bit mad!
I appreciate this may be a silly question but I'm a tad confused...
When sending chat completion requests, do I need to pass these special tokens into the API when I send the request, or do I need to just use the configs in LM Studio?
I.e. option 1:
...
"messages": [
{ "role": "system", "content": "Always answer in rhymes." },
{ "role": "user", "content": "Introduce yourself." }
],
...
Or option 2:
...
"messages": [
{ "role": "system", "content": "<|start_header_id|>system<|end_header_id|>\n\nAlways answer in rhymes. <|eot_id|>" },
{ "role": "user", "content": "<|start_header_id|>user<|end_header_id|>\n\nIntroduce yourself. <|eot_id|><|start_header_id|>assistant<|end_header_id|>" }
],
...
Or something else?!
If option 2, is it correct to place <|start_header_id|>assistant<|end_header_id|>
at the end of the user message?
I'm not getting the sensible output I'd expect (and hear) so I'm definitely doing something wrong somewhere.
Thanks in advance.
@gitmylo What is the official system message for llama3 model ( NousResearch/Meta-Llama-3-8B-Instruct
)?
@gitmylo sorry for missing this issue, thanks for sharing the details! We'll look and circle back
@gitmylo Thanks for the report. Really appreciate the effort into writing all the details. I believe the issues will be fixed in this PR:
https://github.com/lmstudio-ai/configs/pull/47/files
Formatting after the fix:
I have compared this with the documentation you linked very carefully. They should be exactly the same now, including all the newlines.
(The prompt formatting text box in the LM Studio is a bit misleading due to auto wrapping. To get the exact output, use lms log stream
.)
The fix will be included in LM Studio 0.2.24
LM studio's Llama 3 template:
The official Llama 3 template:
Most notable differences:
<|eot_id|>
token which cannot be removed, this changes the structure of the template slightly.In reality, the template should be applied per message, instead of on the whole chat. A single message being like this:
Where
{{ role }}
is the message's role (system
,user
,assistant
. The api already uses those roles since it mimicks OpenAI).Once all the messages are formatted in this template, add one more:
Where
{{ assistant }}
is usually empty. As the LLM is meant to generate this part.How can this be implemented?
I believe this could be implemented with a toggle within templates. Which switches between instruct style (
alpaca
,codellama instruct
/Llama 2 chat
,Phi 2
, etc) and chat style (ChatML
,Llama 3 chat
,Google Gemma Instruct
).When using the
instruct
style, it can remain unchanged. When using thechat
style, The prompt template could for example contain settings like:<|start_header_id|>
system
,user
,assistant
, the appropriate alternatives can optionally be provided heresystem
,user
,assistant
<|end_header_id|>\n\n
<|eot_id|>
(Note: no newlines)Then, writing these things, assume this short chat (in OpenAI api format):
should become (assuming
<|begin_of_text|>
is included, as it is for all system prompts already):This creates a correctly formatted prompt for Llama 3, with multiple messages and roles per message.
Feel free to ask questions if anything isn't clear enough.