langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.84k stars 15.35k forks source link

AzureCosmosDBVectorSearch filter not working #23963

Open GuidoK1 opened 4 months ago

GuidoK1 commented 4 months ago

Checked other resources

Example Code


# Filtering pipeling working in pymongo used to filter on a list of file_ids
query_embedding = self.embedding_client.embed_query(query)
pipeline = [
            {
                '$search': {
                    "cosmosSearch": {
                        "vector": query_embedding,
                        "path": "vectorContent",
                        "k": 5, #, #, "efsearch": 40 # optional for HNSW only 
                        "filter": {"fileId": {'$in': file_ids}}
                    },
                    "returnStoredSource": True }},
            {'$project': { 
                'similarityScore': { '$meta': 'searchScore' }, 
                'document' : '$$ROOT' 
                } 
            },
        ]
docs = self.mongo_collection.aggregate(pipeline)

Current implementation

def _get_pipeline_vector_ivf(
        self, embeddings: List[float], k: int = 4
    ) -> List[dict[str, Any]]:
        pipeline: List[dict[str, Any]] = [
            {
                "$search": {
                    "cosmosSearch": {
                        "vector": embeddings,
                        "path": self._embedding_key,
                        "k": k,
                    },
                    "returnStoredSource": True,
                }
            },
            {
                "$project": {
                    "similarityScore": {"$meta": "searchScore"},
                    "document": "$$ROOT",
                }
            },
        ]
        return pipeline

def _get_pipeline_vector_hnsw(
        self, embeddings: List[float], k: int = 4, ef_search: int = 40
    ) -> List[dict[str, Any]]:
        pipeline: List[dict[str, Any]] = [
            {
                "$search": {
                    "cosmosSearch": {
                        "vector": embeddings,
                        "path": self._embedding_key,
                        "k": k,
                        "efSearch": ef_search,
                    },
                }
            },
            {
                "$project": {
                    "similarityScore": {"$meta": "searchScore"},
                    "document": "$$ROOT",
                }
            },
        ]
        return pipeline

Error Message and Stack Trace (if applicable)

No response

Description

As stated in the langchain documentation filtering in Azure Cosmos DB Mongo vCore should be supported: https://python.langchain.com/v0.2/docs/integrations/vectorstores/azure_cosmos_db/

Filtering works when I apply my MongoDB query directly using pyomongo as shown in the example. However, through langchain the same filters are not applied. I tried using the filter, pre_filter, search_kwargs and kwargs parameters, but to no avail.

docs = self.vectorstore.similarity_search(query, 
    k=5, 
    pre_filter = {'fileId': {'$in': ["31c283c2-ac31-4260-a8d0-864f444c33ee]"}}
)

Upon closer inspection of the source code, I see that no filter key is present in the query dictionary and see no kwargs, search_kwargs being passed, which could be the reason. https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/azure_cosmos_db.py

Any input on this issue?

System Info

System Information

OS: Windows OS Version: 10.0.22631 Python Version: 3.11.4 (tags/v3.11.4:d2340ef, Jun 7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]

Package Information

langchain_core: 0.2.10 langchain: 0.2.6 langchain_community: 0.2.6 langsmith: 0.1.82 langchain_openai: 0.1.13 langchain_text_splitters: 0.2.2

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph langserve

keenborder786 commented 4 months ago

You are right. I will create a fix.

jimcost commented 3 months ago

Any updates on fix?

aglpy commented 1 month ago

How is this fix going?

leonardobaggio commented 6 days ago

I was able to filter using search_kwargs={'pre_filter': ''}:

users_authorized= [8646]

# filter only certain users and specific customers in a database
user_permissions = {"metadata.authorizedUsers": {"$in": users_authorized}}
customer_filter = {"metadata.customerId": 1}

combined_filter = {
    "$and": [
        user_permissions,
        customer_filter
    ]
}

retriever = vector_store.as_retriever(
    search_kwargs={'pre_filter': combined_filter}
)

In fact there is nothing in the docs describing how to use the filter.. had to debug to sort this out.

langchain==0.3.7 langchain-community==0.3.5 langchain-core==0.3.15 langchain-openai==0.2.6

yyueda commented 2 days ago

pymongo.errors.OperationFailure: $filter is not supported for vector search yet., full error: {'ok': 0.0, 'errmsg': '$filter is not supported for vector search yet.', 'code': 115, 'codeName': 'CommandNotSupported'}

Received this error, may I know how did you solve it? Did u have to enable the pre filtering for the Azure Cosmos DB?

From azure's website, (https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search) You can now execute vector searches with any supported query filter such as $lt, $lte, $eq, $neq, $gte, $gt, $in, $nin, and $regex. Enable the "filtering vector search" feature in the "Preview Features" tab of your Azure Subscription.