run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.51k stars 5.01k forks source link

[Bug]: Chat engine not taking string as input #12430

Closed BlueKiji77 closed 2 months ago

BlueKiji77 commented 5 months ago

Bug Description

I am trying the Chat Engine - Context Mode tutorial from the docs and passing a string to the chat engine result in an error seemingly requiring a user/assistant/user/assistant format as input. I dug through the code but can' figure out what is causing the issue.

Version

llama_index-0.10.25

Steps to Reproduce

`

Setting and Loading Up Index

index_store_path= "/kaggle/working/storytelling-gamedesign-qdrantindexstore" client = qdrant_client.QdrantClient(path=index_store_path) qdrant_vector_store = QdrantVectorStore(client=client, collection_name="game_design_qdrant_store") qdrant_vector_store._collection_exists("game_design_qdrant_store")

output_dir = "/kaggle/input/" index_persist_dir = os.path.join(output_dir) persist_dir = "/kaggle/working/storytelling-gamedesign-qdrantindexstore/storage" storage_context = StorageContext.from_defaults( docstore=SimpleDocumentStore.from_persist_dir(persist_dir=persist_dir), vector_store=qdrant_vector_store, index_store=SimpleIndexStore.from_persist_dir(persist_dir=persist_dir), )

from llama_index.core import load_index_from_storage index_name = "game_design_qdrant_index" index = load_index_from_storage(storage_context)

from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

chat_engine = index.as_chat_engine( chat_mode="context", memory=memory, system_prompt=( "You are a chatbot, able to have normal interactions, as well as talk" "design in video games." ), )

response = chat_engine.chat( "What are the first programs Paul Graham tried writing?" ) print(response)

`

Relevant Logs/Tracbacks

TemplateError                             Traceback (most recent call last)
Cell In[27], line 1
----> 1 response = chat_engine.chat(
      2     "What are the first programs Paul Graham tried writing?"
      3 )
      4 print(response)

File /opt/conda/lib/python3.10/site-packages/llama_index/core/callbacks/utils.py:41, in trace_method.<locals>.decorator.<locals>.wrapper(self, *args, **kwargs)
     39 callback_manager = cast(CallbackManager, callback_manager)
     40 with callback_manager.as_trace(trace_id):
---> 41     return func(self, *args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/llama_index/core/chat_engine/context.py:172, in ContextChatEngine.chat(self, message, chat_history)
    164 prefix_messages_token_count = len(
    165     self._memory.tokenizer_fn(
    166         " ".join([(m.content or "") for m in prefix_messages])
    167     )
    168 )
    169 all_messages = prefix_messages + self._memory.get(
    170     initial_token_count=prefix_messages_token_count
    171 )
--> 172 chat_response = self._llm.chat(all_messages)
    173 ai_message = chat_response.message
    174 self._memory.put(ai_message)

File /opt/conda/lib/python3.10/site-packages/llama_index/core/llms/callbacks.py:130, in llm_chat_callback.<locals>.wrap.<locals>.wrapped_llm_chat(_self, messages, **kwargs)
    115 dispatcher.event(
    116     LLMChatStartEvent(
    117         model_dict=_self.to_dict(),
   (...)
    120     )
    121 )
    122 event_id = callback_manager.on_event_start(
    123     CBEventType.LLM,
    124     payload={
   (...)
    128     },
    129 )
--> 130 f_return_val = f(_self, messages, **kwargs)
    132 if isinstance(f_return_val, Generator):
    133     # intercept the generator and add a callback to the end
    134     def wrapped_gen() -> ChatResponseGen:

File /opt/conda/lib/python3.10/site-packages/llama_index/llms/huggingface/base.py:341, in HuggingFaceLLM.chat(self, messages, **kwargs)
    339 @llm_chat_callback()
    340 def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:
--> 341     prompt = self.messages_to_prompt(messages)
    342     completion_response = self.complete(prompt, formatted=True, **kwargs)
    343     return completion_response_to_chat_response(completion_response)

File /opt/conda/lib/python3.10/site-packages/llama_index/llms/huggingface/base.py:256, in HuggingFaceLLM._tokenizer_messages_to_prompt(self, messages)
    251 if hasattr(self._tokenizer, "apply_chat_template"):
    252     messages_dict = [
    253         {"role": message.role.value, "content": message.content}
    254         for message in messages
    255     ]
--> 256     tokens = self._tokenizer.apply_chat_template(messages_dict)
    257     return self._tokenizer.decode(tokens)
    259 return generic_messages_to_prompt(messages)

File /opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1745, in PreTrainedTokenizerBase.apply_chat_template(self, conversation, chat_template, add_generation_prompt, tokenize, padding, truncation, max_length, return_tensors, return_dict, **tokenizer_kwargs)
   1742 # Compilation function uses a cache to avoid recompiling the same template
   1743 compiled_template = self._compile_jinja_template(chat_template)
-> 1745 rendered = compiled_template.render(
   1746     messages=conversation, add_generation_prompt=add_generation_prompt, **self.special_tokens_map
   1747 )
   1749 if padding is True:
   1750     padding = "max_length"  # There's only one sequence here, so "longest" makes no sense

File /opt/conda/lib/python3.10/site-packages/jinja2/environment.py:1301, in Template.render(self, *args, **kwargs)
   1299     return self.environment.concat(self.root_render_func(ctx))  # type: ignore
   1300 except Exception:
-> 1301     self.environment.handle_exception()

File /opt/conda/lib/python3.10/site-packages/jinja2/environment.py:936, in Environment.handle_exception(self, source)
    931 """Exception handling helper.  This is used internally to either raise
    932 rewritten exceptions or return a rendered traceback for the template.
    933 """
    934 from .debug import rewrite_traceback_stack
--> 936 raise rewrite_traceback_stack(source=source)

File <template>:1, in top-level template code()

File /opt/conda/lib/python3.10/site-packages/jinja2/sandbox.py:393, in SandboxedEnvironment.call(_SandboxedEnvironment__self, _SandboxedEnvironment__context, _SandboxedEnvironment__obj, *args, **kwargs)
    391 if not __self.is_safe_callable(__obj):
    392     raise SecurityError(f"{__obj!r} is not safely callable")
--> 393 return __context.call(__obj, *args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1790, in PreTrainedTokenizerBase._compile_jinja_template.<locals>.raise_exception(message)
   1789 def raise_exception(message):
-> 1790     raise TemplateError(message)

TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
dosubot[bot] commented 5 months ago

Hey @BlueKiji77! Good to see you back in the labyrinth of code. Ready for another adventure? 🌀

Based on the error message you're encountering, it seems like the issue is related to the format of the input you're passing to the chat engine. The error message suggests that the chat engine expects an alternating pattern of user and assistant roles in the conversation history.

From the steps you've followed, it appears that you're passing a string directly to the chat method. However, the chat method expects a string representing the user's message and an optional list of ChatMessage objects representing the chat history. Each ChatMessage object should have a content attribute representing the text content of the message and a role attribute representing the role of the message, which can be either "user" or "assistant".

Here's an example of how you can structure your input:

from llama_index.core.chat_engine import ChatMessage

chat_history = [
    ChatMessage(content="Hi, I'm a virtual assistant. How can I help you today?", role="assistant"),
    ChatMessage(content="Can you tell me a joke?", role="user"),
]

user_message = "What are the first programs Paul Graham tried writing?"

response = chat_engine.chat(message=user_message, chat_history=chat_history)

In this example, the chat_history list contains ChatMessage objects that alternate between "assistant" and "user" roles, and the user_message string represents the user's current message.

If you're not providing a chat history and are only passing a user message to the chat method, the chat engine will treat it as a new conversation. In this case, the user message should be the first message in the conversation.

If you're still encountering issues after trying this, please provide more details about your implementation, and I'll be happy to help further. You can refer to the ContextChatEngine class in the LlamaIndex repository for more information about how the chat engine works.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
BlueKiji77 commented 5 months ago

I had tried that before and using your recommendation below unsuccessfully results with the same error: TemplateError: Conversation roles must alternate user/assistant/user/assistant/..

from llama_index.core.base.llms.types import ChatMessage

chat_history = [
    ChatMessage(content="Hi, I'm a virtual assistant. How can I help you today?", role="assistant"),
    ChatMessage(content="Can you tell me a joke?", role="user"),
]

user_message = "What are the first programs Paul Graham tried writing?"

response = chat_engine.chat(message=user_message, chat_history=chat_history)
logan-markewich commented 5 months ago

This is coming from the LLM you are using. It's not supporting system messages, but this chat engine uses system messages.

I would provide a messages_to_prompt callable to the LLM constructor, to handle the messages that are system messages and formatting the prompt properly. This largely depends on your LLM