Open compilebunny opened 1 week ago
To specify the chat template when using LlamaCPP
within llama-index
, you can use the messages_to_prompt
parameter. This parameter accepts a callable that converts a sequence of ChatMessage
objects into a string prompt.
Here is an example of how you can specify the chat template:
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.core.llms.types import ChatMessage
from typing import Sequence
def custom_messages_to_prompt(messages: Sequence[ChatMessage]) -> str:
prompt = ""
for message in messages:
if message.role == 'system':
prompt += f"<|system|>\n{message.content}</s>\n"
elif message.role == 'user':
prompt += f"<|user|>\n{message.content}</s>\n"
elif message.role == 'assistant':
prompt += f"<|assistant|>\n{message.content}</s>\n"
# ensure we start with a system prompt, insert blank if needed
if not prompt.startswith("<|system|>\n"):
prompt = "<|system|>\n</s>\n" + prompt
# add final assistant prompt
prompt = prompt + "<|assistant|>\n"
return prompt
llama3_llm = LlamaCPP(
model_path="/model/path.gguf",
temperature=0.1,
max_new_tokens=2560,
context_window=4096,
generate_kwargs={},
model_kwargs={"n_gpu_layers": 100},
messages_to_prompt=custom_messages_to_prompt,
verbose=True,
)
In this example, custom_messages_to_prompt
is a function that takes a sequence of ChatMessage
objects and converts them into a formatted string. This function is then passed to the LlamaCPP
instance via the messages_to_prompt
parameter [1].
Additionally, you can use the utility functions provided in llama_index.llms.llama_cpp.llama_utils
for default prompt formatting, especially if you are using the llama-2-chat-13b-ggml
model. Here is an example of how to use these utility functions:
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import messages_to_prompt, completion_to_prompt
llama3_llm = LlamaCPP(
model_path="/model/path.gguf",
temperature=0.1,
max_new_tokens=2560,
context_window=4096,
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
model_kwargs={"n_gpu_layers": 100},
generate_kwargs={},
verbose=True,
)
In this example, messages_to_prompt
and completion_to_prompt
are utility functions that help format the model inputs and outputs according to the Llama2 chat model's requirements [1][2].
To continue talking to Dosu, mention @dosu.
@dosu
Is there no way to use existing JINJA chat format info in order to avoid defining this manually?
Yes, you can use existing JINJA chat format information with LlamaCPP to avoid defining the chat template manually. The ChatPromptTemplate
class in llama-index
allows you to define chat templates using JINJA-like placeholders and then format them dynamically.
Here's an example demonstrating how to use ChatPromptTemplate
:
from llama_index.legacy.prompts.base import ChatPromptTemplate, ChatMessage
from llama_index.core.llms.types import MessageRole, PromptType
# Define the chat template with JINJA-like placeholders
chat_template = ChatPromptTemplate(
message_templates=[
ChatMessage(
content="This is a system message with a {sys_param}",
role=MessageRole.SYSTEM,
),
ChatMessage(content="hello {text} {foo}", role=MessageRole.USER),
],
prompt_type=PromptType.CONVERSATION,
)
# Partially format the template with some parameters
partial_template = chat_template.partial_format(sys_param="sys_arg")
# Format the messages with the remaining parameters
messages = partial_template.format_messages(text="world", foo="bar")
# Convert the formatted messages to a prompt string
def custom_messages_to_prompt(messages):
return "\n".join([f"{msg.role}: {msg.content}" for msg in messages])
# Use the custom function in LlamaCPP
llama3_llm = LlamaCPP(
model_path="/model/path.gguf",
temperature=0.1,
max_new_tokens=2560,
context_window=4096,
messages_to_prompt=custom_messages_to_prompt,
model_kwargs={"n_gpu_layers": 100},
verbose=True,
)
In this example, the ChatPromptTemplate
is used to define a chat format with placeholders like {sys_param}
, {text}
, and {foo}
. You can then use the partial_format
and format_messages
methods to dynamically fill in these placeholders without manually defining the chat template each time [1][2].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
Normally, if one were starting a llama cpp server, one would specify the chat template as follows:
llama-server --model /models/abc.gguf --chat-template llama3
How does one do this using llamaCPP within llama-index?