run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.91k stars 5.09k forks source link

[Bug]: The ChatMemoryBuffer.to_string() and ChatMemoryBuffer.to_dict() methods are broken. #11537

Closed gich2009 closed 7 months ago

gich2009 commented 7 months ago

Bug Description

The ChatMemoryBuffer.to_string() and ChatMemoryBuffer.to_dict() methods are broken.

I suspect this is because of the tokenizer_fn which is a lambda. Not being json serializable was fine because you could work around that by using the pickle library and pickling the ChatMemoryBuffer object, but that is not possible anymore.

Version

v0.10.14

Steps to Reproduce

Just try converting any ChatMemoryBuffer object to a dict or a string using the behaviour that llama_index exposes. ChatMemoryBuffer.to_string() or ChatMemoryBuffer.to_dict()

Relevant Logs/Tracbacks

lib/bots/staticContextCompareChatbot.py:123: in _static_comparison_chatbot
    string_memory = memory.to_string()
venv/lib/python3.10/site-packages/llama_index/core/memory/chat_memory_buffer.py:76: in to_string
    return self.json()
venv/lib/python3.10/site-packages/llama_index/core/schema.py:57: in json
    return self.to_json(**kwargs)
venv/lib/python3.10/site-packages/llama_index/core/schema.py:102: in to_json
    return json.dumps(data)
/usr/lib/python3.10/json/__init__.py:231: in dumps
    return _default_encoder.encode(obj)
/usr/lib/python3.10/json/encoder.py:199: in encode
    chunks = self.iterencode(o, _one_shot=True)
/usr/lib/python3.10/json/encoder.py:257: in iterencode
    return _iterencode(o, 0)

self = <json.encoder.JSONEncoder object at 0x7f0eef332230>, o = <function BaseBot.<lambda> at 0x7f0ebed79480>

    def default(self, o):
        """Implement this method in a subclass such that it returns
        a serializable object for ``o``, or calls the base implementation
        (to raise a ``TypeError``).

        For example, to support arbitrary iterators, you could
        implement default like this::

            def default(self, o):
                try:
                    iterable = iter(o)
                except TypeError:
                    pass
                else:
                    return list(iterable)
                # Let the base class default method raise the TypeError
                return JSONEncoder.default(self, o)

        """
>       raise TypeError(f'Object of type {o.__class__.__name__} '
                        f'is not JSON serializable')
E       TypeError: Object of type function is not JSON serializable

/usr/lib/python3.10/json/encoder.py:179: TypeError
dosubot[bot] commented 7 months ago

Based on the error message you provided, it seems that the issue is indeed related to the tokenizer_fn lambda function not being JSON serializable. This is because JSON does not support function serialization directly.

To resolve this issue, you can modify the to_dict and to_string methods in the ChatMemoryBuffer class to exclude the tokenizer_fn from the serialization process. Here's how you can do it:

def to_dict(self, **kwargs: Any) -> dict:
    """Convert memory to dict, excluding non-serializable fields."""
    return super().dict(exclude={'tokenizer_fn'}, **kwargs)

def to_string(self) -> str:
    """Convert memory to string, excluding non-serializable fields."""
    return json.dumps(self.to_dict())

In the to_dict method, super().dict(exclude={'tokenizer_fn'}, **kwargs) is called to generate a dictionary representation of the instance, explicitly excluding the tokenizer_fn field. This ensures that when to_dict is called, the resulting dictionary does not contain the tokenizer_fn field, thus avoiding serialization issues related to this field.

In the to_string method, json.dumps(self.to_dict()) is used to convert the instance to a JSON string. This works by first converting the instance to a dictionary (while excluding the tokenizer_fn field as explained above) and then serializing this dictionary to a JSON string.

These modifications should resolve the issue you're facing. Please try this solution and let me know if you encounter any further issues.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
gich2009 commented 7 months ago

@dosubot. I have made the change and I am still getting this serialization error: ''' string_memory = memory.to_string() File "/home/gich2009/Work/BAYESNET/venv/lib/python3.10/site-packages/llama_index/core/memory/chat_memory_buffer.py", line 84, in to_string return json.dumps(self.to_dict()) File "/usr/lib/python3.10/json/init.py", line 231, in dumps return _default_encoder.encode(obj) File "/usr/lib/python3.10/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.10/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/usr/lib/python3.10/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type function is not JSON serializable '''

dosubot[bot] commented 7 months ago

Based on the error message you provided, it seems like there's still a non-serializable object within your ChatMemoryBuffer instance. The error message indicates that a function object is not JSON serializable, which suggests that there might be a function or method somewhere in your ChatMemoryBuffer instance that is not being properly excluded from the serialization process.

Even though you've excluded the tokenizer_fn from the serialization process, there might be other function or method attributes in the ChatMemoryBuffer or in one of its attributes' classes.

One potential source of this issue could be the chat_store attribute. If the chat_store is an instance of a class that has function or method attributes, those would also need to be excluded from the serialization process.

To ensure that all attributes of the ChatMemoryBuffer class are serializable to JSON format, you should ensure that all other fields are inherently serializable.

Here's a simplified approach to ensure BaseChatStore instances are serializable:

class BaseChatStore:
    # Your existing implementation

    def to_dict(self) -> dict:
        # Implement serialization logic for the chat store
        pass

class SimpleChatStore(BaseChatStore):
    # Your existing implementation

    def to_dict(self) -> dict:
        # Implement serialization logic specific to SimpleChatStore
        return {
            # Example serialization logic
            "messages": self.messages,
        }

class ChatMemoryBuffer(BaseMemory):
    # Your existing implementation

    def to_dict(self, **kwargs: Any) -> dict:
        """Convert memory to dict."""
        data = super().dict(**kwargs)
        # Ensure chat_store is serialized properly
        data["chat_store"] = self.chat_store.to_dict()
        return data

This approach should help resolve the serialization error by ensuring that the entire state of ChatMemoryBuffer can be converted to a JSON-compatible format.

If you're still encountering issues after trying this, could you please provide more information about the ChatMemoryBuffer instance that you're trying to serialize? Specifically, it would be helpful to know the types and values of all its attributes, as well as the types and values of the attributes of any objects that those attributes might contain. This will help identify any other potential sources of non-serializable objects.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
gich2009 commented 7 months ago

I've tried pretty much everything I can think of, passing in a chat_history instead or manipulating the chat_store. Most of the approaches either lead to a lambda error or this error when I use a chat_messages approach: The chat messages are [{'role': <MessageRole.SYSTEM: 'system'>, 'content': '\n - You are a helpful AI assistant.\n - Please provide the user with any assistance they may require.\n '}] Traceback (most recent call last): File "/home/gich2009/Work/BAYESNET/lib_tests/bots/manual_noneContextChatbot.py", line 56, in test_unit() File "/home/gich2009/Work/BAYESNET/lib_tests/bots/manual_noneContextChatbot.py", line 52, in test_unit qualitative_none_context_chat_test(input=input) File "/home/gich2009/Work/BAYESNET/lib_tests/bots/../../lib/tests/manual_chatbot_tests.py", line 281, in qualitative_none_context_chat_test File "/home/gich2009/Work/BAYESNET/lib_tests/bots/../../lib/bots/noneContextChatbot.py", line 260, in chat response = self._static_context_chatbot(input_text=prompt) File "/home/gich2009/Work/BAYESNET/lib_tests/bots/../../lib/bots/noneContextChatbot.py", line 231, in _static_context_chatbot response: AgentChatResponse = chat_engine.chat(message=input_text) File "/home/gich2009/Work/BAYESNET/venv/lib/python3.10/site-packages/llama_index/core/callbacks/utils.py", line 41, in wrapper return func(self, *args, **kwargs) File "/home/gich2009/Work/BAYESNET/venv/lib/python3.10/site-packages/llama_index/core/chat_engine/simple.py", line 92, in chat all_messages = self._prefix_messages + self._memory.get( File "/home/gich2009/Work/BAYESNET/venv/lib/python3.10/site-packages/llama_index/core/memory/chat_memory_buffer.py", line 113, in get self._token_count_for_message_count(message_count) + initial_token_count File "/home/gich2009/Work/BAYESNET/venv/lib/python3.10/site-packages/llama_index/core/memory/chat_memory_buffer.py", line 155, in _token_count_for_message_count msg_str = " ".join(str(m.content) for m in chat_history[-message_count:]) File "/home/gich2009/Work/BAYESNET/venv/lib/python3.10/site-packages/llama_index/core/memory/chat_memory_buffer.py", line 155, in msg_str = " ".join(str(m.content) for m in chat_history[-message_count:]) AttributeError: 'dict' object has no attribute 'content'

gich2009 commented 7 months ago

Just found out that the chat_store is pickleable so that tentatively solves my problem:

Relevant imports

user_message = ChatMessage(role="user", content="hello") chat_store = SimpleChatStore(store={DEFAULT_CHAT_STORE_KEY: [user_message()]}) pickled_chat_store = pickle.dumps(chat_store) unpickled_chat_store = pickle.loads(pickled_chat_store)

print(unpickled_chat_store)

gich2009 commented 7 months ago

Hi @logan-markewich After thoroughly investigating this, I have realised that the issue was actually on my side. I was constructing a system_message with a lambda and initializing it in my pydantic class as a Field(default=) parameter instead of a Field(default_factory=) parameter. Apologies for the alarm bells. The lambda error was originating from my code.

Still, it may be worth checking that the .to_string() and the .to_dict() methods work as expected. I will do this and comment here

logan-markewich commented 7 months ago

@gich2009 ah good to know! I was just about to debug this haha. If to_dict() and to_string() aren't working, do let me know

gich2009 commented 7 months ago

I just confirmed that they are working fine. If I observe a problem, I will reopen this.