run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.48k stars 5.21k forks source link

[Question]: Does using a ContextChatEngine with `stream_chat` always ignore the system prompt? #11777

Closed ckrapu closed 7 months ago

ckrapu commented 7 months ago

Question Validation

Question

When using chat with a ContextChatEngine, it looks like the system prompt is always discarded and only the prefix messages are taken. Is this as intended?

What is the ideal way to supply instructions to this type of chat engine?

dosubot[bot] commented 7 months ago

Based on the information you've provided and the context from the LlamaIndex repository, it appears that the behavior you're observing is indeed the intended behavior of the ContextChatEngine class. The system prompt is treated differently from the prefix messages. If a system prompt is present, it is initially separated from the prefix messages and then prepended to the context string. The prefix messages, along with this dynamically generated context string, are then used to form the input for the LLM chat method.

This design allows for dynamic context generation based on the message being processed. The context string is dynamically generated and included with the prefix messages for each chat interaction. This is evident from the implementation of the _get_prefix_messages_with_context method in the ContextChatEngine class:

def _get_prefix_messages_with_context(self, context_str: str) -> List[ChatMessage]:
    """Get the prefix messages with context."""
    # ensure we grab the user-configured system prompt
    system_prompt = ""
    prefix_messages = self._prefix_messages
    if (
        len(self._prefix_messages) != 0
        and self._prefix_messages[0].role == MessageRole.SYSTEM
    ):
        system_prompt = str(self._prefix_messages[0].content)
        prefix_messages = self._prefix_messages[1:]

    context_str_w_sys_prompt = system_prompt.strip() + "\n" + context_str
    return [
        ChatMessage(
            content=context_str_w_sys_prompt, role=self._llm.metadata.system_role
        ),
        *prefix_messages,
    ]

This method demonstrates that the system prompt is considered differently from the prefix messages and that the primary mechanism for supplying instructions or context to the chat engine is through the dynamically generated context string and the prefix messages.

There have been similar issues in the past, such as [Bug]: ContextChatEngine user supplied system prompt is concatenated below the context, is it misplaced? and [Bug]: chat_engine - context and system prompt is not added to the first chat , but these have been resolved and the current behavior is as per the design.

I hope this clarifies your question. If you have further queries, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
logan-markewich commented 7 months ago

@ckrapu its not discarded.

The context chat engine uses the system prompt to supply retrieved context (you'll notice there is a context template at the top of the file)

The system prompt provided by the user is added to the prefix messages: https://github.com/run-llama/llama_index/blob/55229321245a81a9574cdfa571c61fb6c30aee9d/llama-index-core/llama_index/core/chat_engine/context.py#L89

and then it is later added to the end of the context template

This provides an easy way to give a system prompt without modifying the entire context template.

For example, this works:

>>> from llama_index.core import Document, VectorStoreIndex
>>> index = VectorStoreIndex.from_documents([Document.example()])
>>> chat_engine = index.as_chat_engine(chat_mode="context", system_prompt="Talk like a pirate in every response.")
>>> resp = chat_engine.stream_chat("What do you know about LLMs?")
>>> for r in resp.response_gen:
...     print(r, end="", flush=True)
... 
Arrr matey! LLMs be a mighty fine piece o' technology fer knowledge generation and reasonin'. They be pre-trained on vast amounts o' publicly available data, makin' 'em smart as a savvy pirate on the high seas! If ye be lookin' to augment LLMs with yer own private data, ye be needin' a toolkit like LlamaIndex to help ye on yer quest. Ye can trust me on that, savvy? Arrr!