run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.31k stars 4.66k forks source link

[Question]: LlamaIndex Multi-Document Agents #13728

Open dragoa opened 1 month ago

dragoa commented 1 month ago

Question Validation

Question

Hi, I'm following the documentation at but instead of using OpenAI I'm using Mistral 7b via HuggingFaceLLM

    context_window=4096,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.0, "do_sample": False},
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="mistralai/Mistral-7B-Instruct-v0.3",
    model_name="mistralai/Mistral-7B-Instruct-v0.3",
    device_map="auto",
    stopping_ids=[50278, 50279, 50277, 1, 0],
    tokenizer_kwargs={"max_length": 4096},
    # uncomment this if using CUDA to reduce memory usage
    model_kwargs={"torch_dtype": torch.float16}
)

Settings.llm = llm
Settings.chunk_size = 1024

Now at the end when I'm calling the response = top_agent.chat("Tell me about the status of Confezioni Simmy 2014 company") I receive the following error: TemplateError: Conversation roles must alternate user/assistant/user/assistant/... I used as promt

from llama_index.core import PromptTemplate

SYSTEM_PROMPT = """You are an AI assistant that answers questions in a friendly manner, based on the given source documents. Here are some rules you always follow:
- Generate human readable output, avoid creating output with gibberish text.
- Generate only the requested output, don't include any other language before or after the requested output.
- Never say thank you, that you are happy to help, that you are an AI agent, etc. Just answer directly.
- Generate professional language typically used in business documents in North America.
- Never generate offensive or foul language.
"""

query_wrapper_prompt = PromptTemplate(
    "[INST]<<SYS>>\n" + SYSTEM_PROMPT + "<</SYS>>\n\n{query_str}[/INST] "
)

so is the user that correctly start it.

Do you know what I could fix to make it work?

Thank you!

dosubot[bot] commented 1 month ago

Hi there, @dragoa! I'm a bot here to help you with bugs, questions, and more while you wait for a human maintainer. Let's tackle this together! 🤖

To fix the TemplateError: Conversation roles must alternate user/assistant/user/assistant/... error, you need to ensure that the sequence of ChatMessage objects alternates correctly between user and assistant roles. Here is a code snippet that demonstrates how to properly structure the messages to avoid this error:

from llama_index.core.base.llms.types import ChatMessage, MessageRole
from llama_index.llms.mistralai import MistralAI
from transformers import BitsAndBytesConfig
import torch

# Define the function to convert messages to prompt
def messages_to_prompt(messages):
    prompt = ""
    for message in messages:
        if message.role == 'system':
            prompt += f"<|system|>\n{message.content}</s>\n"
        elif message.role == 'user':
            prompt += f"<|user|>\n{message.content}</s>\n"
        elif message.role == 'assistant':
            prompt += f"<|assistant|>\n{message.content}</s>\n"

    if not prompt.startswith("<|system|>\n"):
        prompt = "<|system|>\n</s>\n" + prompt

    prompt = prompt + "<|assistant|>\n"
    return prompt

# Define the function to convert completion to prompt
def completion_to_prompt(completion):
    return f"<|system|>\n</s>\n<|user|>\n{completion}</s>\n<|assistant|>\n"

# Quantize to save memory
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

# Initialize the MistralAI instance
llm = MistralAI(
    model="mistral-tiny",
    temperature=0.7,
    max_tokens=256,
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    additional_kwargs={"quantization_config": quantization_config},
)

# Example of correctly alternating messages
messages = [
    ChatMessage(role=MessageRole.USER, content="Hello, how are you?"),
    ChatMessage(role=MessageRole.ASSISTANT, content="I'm good, thank you! How can I assist you today?"),
    ChatMessage(role=MessageRole.USER, content="Can you tell me about the weather?"),
    ChatMessage(role=MessageRole.ASSISTANT, content="Sure, the weather today is sunny with a high of 75°F."),
]

# Ensure the messages are passed in the correct order to the chat method
response = llm.chat(messages=messages)
print(response.message.content)

Make sure that every ChatMessage with the role MessageRole.USER is followed by a ChatMessage with the role MessageRole.ASSISTANT, and vice versa. This will ensure that the conversation roles alternate correctly and prevent the TemplateError [1][2][3][4].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

JensPMB commented 1 month ago

Hi, when you call the chat method, it applies the chat template (in the tokenizer.) For some reason the HuggingFaces tokenizer has an old template for Mistral-7B-Instruct-v0.2 / Mistral-7B-Instruct-v0.3 (and I believe Mistral-7B-Instruct-v0.1 genuinely didn't support system prompt.) To get around the problem you can (a) edit the tokenizer_config.json, (b) write your own messages_to_prompt in llama_index and pass in to llm constructor or (c) load a tokenizer that works e.g. llama-2 chat. 🤷

I did (b), as I'm no good at Jinga

custom messages_to_prompt

    def mistralv2_messages_to_prompt(messages):
        conversation = ""
        bos_token = "<s>"
        eos_token= "</s>"
        if messages[0].role ==  MessageRole.SYSTEM:
            loop_messages = messages[1:]
            system_message = messages[0].content
        else:
            loop_messages = messages
            system_message = False

        for index, message in enumerate(loop_messages):
            if (message.role == MessageRole.USER) != (index % 2 == 0):
                raise Exception('HFI Conversation roles must alternate user/assistant/user/assistant/...')
            if index == 0 and system_message != False:
                content = '<<SYS>>\n' + system_message + '\n<</SYS>>\n\n' + message.content
            else:
                content = message.content
            if message.role == MessageRole.USER:
                conversation += bos_token + '[INST] ' + content.strip() + ' [/INST]'
            elif message.role == MessageRole.ASSISTANT:
                conversation += ' ' + content.strip() + ' ' + eos_token

        return (conversation)