Closed gich2009 closed 7 months ago
Based on the error message you provided, it seems that the issue is indeed related to the tokenizer_fn
lambda function not being JSON serializable. This is because JSON does not support function serialization directly.
To resolve this issue, you can modify the to_dict
and to_string
methods in the ChatMemoryBuffer
class to exclude the tokenizer_fn
from the serialization process. Here's how you can do it:
def to_dict(self, **kwargs: Any) -> dict:
"""Convert memory to dict, excluding non-serializable fields."""
return super().dict(exclude={'tokenizer_fn'}, **kwargs)
def to_string(self) -> str:
"""Convert memory to string, excluding non-serializable fields."""
return json.dumps(self.to_dict())
In the to_dict
method, super().dict(exclude={'tokenizer_fn'}, **kwargs)
is called to generate a dictionary representation of the instance, explicitly excluding the tokenizer_fn
field. This ensures that when to_dict
is called, the resulting dictionary does not contain the tokenizer_fn
field, thus avoiding serialization issues related to this field.
In the to_string
method, json.dumps(self.to_dict())
is used to convert the instance to a JSON string. This works by first converting the instance to a dictionary (while excluding the tokenizer_fn
field as explained above) and then serializing this dictionary to a JSON string.
These modifications should resolve the issue you're facing. Please try this solution and let me know if you encounter any further issues.
@dosubot. I have made the change and I am still getting this serialization error: ''' string_memory = memory.to_string() File "/home/gich2009/Work/BAYESNET/venv/lib/python3.10/site-packages/llama_index/core/memory/chat_memory_buffer.py", line 84, in to_string return json.dumps(self.to_dict()) File "/usr/lib/python3.10/json/init.py", line 231, in dumps return _default_encoder.encode(obj) File "/usr/lib/python3.10/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.10/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/usr/lib/python3.10/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type function is not JSON serializable '''
Based on the error message you provided, it seems like there's still a non-serializable object within your ChatMemoryBuffer
instance. The error message indicates that a function object is not JSON serializable, which suggests that there might be a function or method somewhere in your ChatMemoryBuffer
instance that is not being properly excluded from the serialization process.
Even though you've excluded the tokenizer_fn
from the serialization process, there might be other function or method attributes in the ChatMemoryBuffer
or in one of its attributes' classes.
One potential source of this issue could be the chat_store
attribute. If the chat_store
is an instance of a class that has function or method attributes, those would also need to be excluded from the serialization process.
To ensure that all attributes of the ChatMemoryBuffer
class are serializable to JSON format, you should ensure that all other fields are inherently serializable.
Here's a simplified approach to ensure BaseChatStore
instances are serializable:
class BaseChatStore:
# Your existing implementation
def to_dict(self) -> dict:
# Implement serialization logic for the chat store
pass
class SimpleChatStore(BaseChatStore):
# Your existing implementation
def to_dict(self) -> dict:
# Implement serialization logic specific to SimpleChatStore
return {
# Example serialization logic
"messages": self.messages,
}
class ChatMemoryBuffer(BaseMemory):
# Your existing implementation
def to_dict(self, **kwargs: Any) -> dict:
"""Convert memory to dict."""
data = super().dict(**kwargs)
# Ensure chat_store is serialized properly
data["chat_store"] = self.chat_store.to_dict()
return data
This approach should help resolve the serialization error by ensuring that the entire state of ChatMemoryBuffer
can be converted to a JSON-compatible format.
If you're still encountering issues after trying this, could you please provide more information about the ChatMemoryBuffer
instance that you're trying to serialize? Specifically, it would be helpful to know the types and values of all its attributes, as well as the types and values of the attributes of any objects that those attributes might contain. This will help identify any other potential sources of non-serializable objects.
I've tried pretty much everything I can think of, passing in a chat_history instead or manipulating the chat_store. Most of the approaches either lead to a lambda error or this error when I use a chat_messages approach:
The chat messages are [{'role': <MessageRole.SYSTEM: 'system'>, 'content': '\n - You are a helpful AI assistant.\n - Please provide the user with any assistance they may require.\n '}]
Traceback (most recent call last):
File "/home/gich2009/Work/BAYESNET/lib_tests/bots/manual_noneContextChatbot.py", line 56, in
Just found out that the chat_store is pickleable so that tentatively solves my problem:
user_message = ChatMessage(role="user", content="hello") chat_store = SimpleChatStore(store={DEFAULT_CHAT_STORE_KEY: [user_message()]}) pickled_chat_store = pickle.dumps(chat_store) unpickled_chat_store = pickle.loads(pickled_chat_store)
print(unpickled_chat_store)
Hi @logan-markewich After thoroughly investigating this, I have realised that the issue was actually on my side. I was constructing a system_message with a lambda and initializing it in my pydantic class as a Field(default=) parameter instead of a Field(default_factory=) parameter. Apologies for the alarm bells. The lambda error was originating from my code.
Still, it may be worth checking that the .to_string() and the .to_dict() methods work as expected. I will do this and comment here
@gich2009 ah good to know! I was just about to debug this haha. If to_dict() and to_string() aren't working, do let me know
I just confirmed that they are working fine. If I observe a problem, I will reopen this.
Bug Description
The ChatMemoryBuffer.to_string() and ChatMemoryBuffer.to_dict() methods are broken.
I suspect this is because of the tokenizer_fn which is a lambda. Not being json serializable was fine because you could work around that by using the pickle library and pickling the ChatMemoryBuffer object, but that is not possible anymore.
Version
v0.10.14
Steps to Reproduce
Just try converting any ChatMemoryBuffer object to a dict or a string using the behaviour that llama_index exposes. ChatMemoryBuffer.to_string() or ChatMemoryBuffer.to_dict()
Relevant Logs/Tracbacks