Clarification on `is_chat_model` Parameter in `LLMMetadata`

RakeshReddyKondeti commented 2 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

Hi,

I need clarification on the is_chat_model parameter in LLMMetadata. Specifically, I would like to understand whether this parameter depends on the LLM model itself or the framework used to load the model.

For instance, I am using the LLama 3.1 gguf model. How can I determine whether the is_chat_model parameter should be set to true or false? I have observed that when using the llama-cpp-python framework, is_chat_model is false (the default value), but when using the Ollama framework, it is true.

Could you please provide guidance on how to accurately determine the value of is_chat_model for different models and frameworks?

Thank you!

dosubot[bot] commented 2 months ago

To determine the value of the is_chat_model parameter in LLMMetadata for different models and frameworks, you need to check if the model exposes a chat interface. Specifically, the is_chat_model should be set to True if the model can be passed a sequence of messages rather than just text.

For the LLama 3.1 gguf model:

Using llama-cpp-python: This framework supports a chat interface for the LLama 3.1 gguf model. The documentation and examples explicitly show how to set up and use the chat interface, confirming that is_chat_model should be set to True ^[1]^[2].
Using the Ollama framework: The Ollama framework also supports a chat interface. The metadata property in the OllamaMultiModal class includes the attribute is_chat_model=True, and there are methods like chat, stream_chat, and astream_chat that handle chat interactions. Therefore, is_chat_model should be set to True ^[3]^[4]^[5]^[6].

In summary, for both llama-cpp-python and the Ollama framework, the is_chat_model parameter should be set to True for the LLama 3.1 gguf model.

To continue talking to Dosu, mention @dosu.

logan-markewich commented 2 months ago

is_chat_model won't do anything for llama.cpp

LlamaCPP (at least the way its implemented in llama-index) will apply the provided templating function to translate chat messages into a model specific format

Ollama provides a chat endpoint that works directly with chat messages, so there is no need to provide templating functions, and hence, is_chat_model=True

RakeshReddyKondeti commented 2 months ago

Thanks @logan-markewich,

I am working with LlamaCPP. When I set is_chat_model = true, your recent merge #16017 is not working. However, when I set is_chat_model = false, it works as expected. I am unsure what exactly changes in this context. Could you provide more details on why this behavior occurs and how is_chat_model influences the functionality in LlamaCPP?

Thank you!

logan-markewich commented 2 months ago

If is_chat_model is set for llama-cpp, it applies the provided message_to_prompt function

Otherwise, it assumes the inputs are already formated, and does not touch them

If you set is_chat_model=True, provide a messages_to_prompt function to transform message dicts to the model format https://github.com/run-llama/llama_index/blob/0491ab7cd376aaeb45d725c412ff45d7f74c50b9/llama-index-integrations/llms/llama-index-llms-llama-cpp/llama_index/llms/llama_cpp/base.py#L256

RakeshReddyKondeti commented 2 months ago

Thanks for the explanation @logan-markewich,

I am using ContextPlusCondenseChatEngine from #16017 and LlamaCPP for initializing the LLM.

I noticed that in the current implementataion of _extend_messages method, the system prompt is appended to the chat message list even if it already exists. I am not sure if this behavior intended or is it a glitch, but this can lead to redundant system prompts in the message list, causing issues in the subsequent processing, especially in messages_to_prompt method.

Suggested Improvement: The method should check if a system prompt already exists in the message list. If it does, it should update the existing system prompt instead of appending a new one, something like the following.

def _extend_messages(self, messages: List[ChatMessage]) -> List[ChatMessage]:
    """Add system prompt to chat message list."""
    system_message_found = False

    for message in messages:
        if isinstance(message, ChatMessage) and message.role == MessageRole.SYSTEM:
            message.content += self.system_prompt
            system_message_found = True
            break

    if not system_message_found and self.system_prompt:
        messages = [
            ChatMessage(role=MessageRole.SYSTEM, content=self.system_prompt),
            *messages,
        ]

    return messages

This modified method ensures that the system prompt is only added once, preventing redundancy and potential issues in the subsequent message processing.

run-llama / llama_index

Clarification on `is_chat_model` Parameter in `LLMMetadata` #16301

Question Validation

Question