langchain-ai / chat-langchain

https://chat.langchain.com
MIT License
4.98k stars 1.16k forks source link

MongoDB connected: Type is not JSON serializable: ObjectId #294

Open adrianruchti opened 3 months ago

adrianruchti commented 3 months ago

Hello everyone, First thank you for the great app. I learnt a lot with this app about LCEL already. I am trying to connect with MongoDB instead of weaviate. The Ingest part works fine. For the retrieval (chain.py) I get this error:

"data": self._serializer.dumps(data).decode("utf-8"),
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/Library/Caches/pypoetry/virtualenvs/chat-langchain-LJAK__Vy-py3.11/lib/python3.11/site-packages/langserve/serialization.py", line 171, in dumps
    return orjson.dumps(obj, default=default)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Type is not JSON serializable: ObjectId

A Fix that worked is to modify the langserve/serialization.py like:

def default_serializer(obj):
    if isinstance(obj, ObjectId):
        return str(obj)
    elif isinstance(obj, Document):  # Assuming `Document` is imported or available in this scope
        # Convert Document to a serializable structure, this is a simplistic example
        # You may need to adjust this based on the actual structure of `Document` objects
        return {
            "metadata": obj.metadata,  # Assuming `metadata` is serializable as-is or you should further process it like converting ObjectId within it
            "page_content": obj.page_content,  # Assuming `page_content` is a string
            # Add other fields as necessary
        }
    raise TypeError("Type is not JSON serializable: " + type(obj).__name__)

As it is not really nice to modify the library like this I was wondering if there is another method in the chain code itself to serialise the ObjectID.

I am using this in the chain.py:

def get_retriever() -> BaseRetriever:
    MONGO_URI = os.environ["MONGO_URI"]
    DB_NAME = "langchain_chatbot"
    COLLECTION_NAME = "data"
    ATLAS_VECTOR_SEARCH_INDEX_NAME = "vector_index"
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = client[DB_NAME][COLLECTION_NAME]
mongo_client = MongoDBAtlasVectorSearch.from_connection_string(
    connection_string=MONGO_URI,
    namespace=DB_NAME + "." + COLLECTION_NAME,
    embedding= get_embeddings_model(),
    index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
    text_key="text"

)

return mongo_client.as_retriever(search_type="similarity", search_kwargs={"k": 5})