run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.42k stars 5k forks source link

[Question]: How to specify chat template when using LlamaCPP #15862

Open compilebunny opened 1 week ago

compilebunny commented 1 week ago

Question Validation

Question

Normally, if one were starting a llama cpp server, one would specify the chat template as follows:

llama-server --model /models/abc.gguf --chat-template llama3 How does one do this using llamaCPP within llama-index?

from llama_index.llms.llama_cpp import LlamaCPP

llama3_llm = LlamaCPP(
    # You can pass in the URL to a GGUF model to download it automatically
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path="/model/path.gguf",
    temperature=0.1,
    max_new_tokens=2560,
    context_window=4096,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 100},
    messages_to_prompt=???,
    completion_to_prompt=???,
    verbose=True,
)
dosubot[bot] commented 1 week ago

To specify the chat template when using LlamaCPP within llama-index, you can use the messages_to_prompt parameter. This parameter accepts a callable that converts a sequence of ChatMessage objects into a string prompt.

Here is an example of how you can specify the chat template:

from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.core.llms.types import ChatMessage
from typing import Sequence

def custom_messages_to_prompt(messages: Sequence[ChatMessage]) -> str:
    prompt = ""
    for message in messages:
        if message.role == 'system':
            prompt += f"<|system|>\n{message.content}</s>\n"
        elif message.role == 'user':
            prompt += f"<|user|>\n{message.content}</s>\n"
        elif message.role == 'assistant':
            prompt += f"<|assistant|>\n{message.content}</s>\n"

    # ensure we start with a system prompt, insert blank if needed
    if not prompt.startswith("<|system|>\n"):
        prompt = "<|system|>\n</s>\n" + prompt

    # add final assistant prompt
    prompt = prompt + "<|assistant|>\n"

    return prompt

llama3_llm = LlamaCPP(
    model_path="/model/path.gguf",
    temperature=0.1,
    max_new_tokens=2560,
    context_window=4096,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": 100},
    messages_to_prompt=custom_messages_to_prompt,
    verbose=True,
)

In this example, custom_messages_to_prompt is a function that takes a sequence of ChatMessage objects and converts them into a formatted string. This function is then passed to the LlamaCPP instance via the messages_to_prompt parameter [1].

Additionally, you can use the utility functions provided in llama_index.llms.llama_cpp.llama_utils for default prompt formatting, especially if you are using the llama-2-chat-13b-ggml model. Here is an example of how to use these utility functions:

from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import messages_to_prompt, completion_to_prompt

llama3_llm = LlamaCPP(
    model_path="/model/path.gguf",
    temperature=0.1,
    max_new_tokens=2560,
    context_window=4096,
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    model_kwargs={"n_gpu_layers": 100},
    generate_kwargs={},
    verbose=True,
)

In this example, messages_to_prompt and completion_to_prompt are utility functions that help format the model inputs and outputs according to the Llama2 chat model's requirements [1][2].

To continue talking to Dosu, mention @dosu.

compilebunny commented 1 week ago

@dosu

Is there no way to use existing JINJA chat format info in order to avoid defining this manually?

dosubot[bot] commented 1 week ago

Yes, you can use existing JINJA chat format information with LlamaCPP to avoid defining the chat template manually. The ChatPromptTemplate class in llama-index allows you to define chat templates using JINJA-like placeholders and then format them dynamically.

Here's an example demonstrating how to use ChatPromptTemplate:

from llama_index.legacy.prompts.base import ChatPromptTemplate, ChatMessage
from llama_index.core.llms.types import MessageRole, PromptType

# Define the chat template with JINJA-like placeholders
chat_template = ChatPromptTemplate(
    message_templates=[
        ChatMessage(
            content="This is a system message with a {sys_param}",
            role=MessageRole.SYSTEM,
        ),
        ChatMessage(content="hello {text} {foo}", role=MessageRole.USER),
    ],
    prompt_type=PromptType.CONVERSATION,
)

# Partially format the template with some parameters
partial_template = chat_template.partial_format(sys_param="sys_arg")

# Format the messages with the remaining parameters
messages = partial_template.format_messages(text="world", foo="bar")

# Convert the formatted messages to a prompt string
def custom_messages_to_prompt(messages):
    return "\n".join([f"{msg.role}: {msg.content}" for msg in messages])

# Use the custom function in LlamaCPP
llama3_llm = LlamaCPP(
    model_path="/model/path.gguf",
    temperature=0.1,
    max_new_tokens=2560,
    context_window=4096,
    messages_to_prompt=custom_messages_to_prompt,
    model_kwargs={"n_gpu_layers": 100},
    verbose=True,
)

In this example, the ChatPromptTemplate is used to define a chat format with placeholders like {sys_param}, {text}, and {foo}. You can then use the partial_format and format_messages methods to dynamically fill in these placeholders without manually defining the chat template each time [1][2].

To continue talking to Dosu, mention @dosu.