langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
88.33k stars 13.86k forks source link

Cannot filter with metadata with azure_cosmos_db_no_sql #23089

Open Hela-Masri-shift opened 1 week ago

Hela-Masri-shift commented 1 week ago

Checked other resources

Example Code

I have inserted documents in a cosmos db with no-sql api, the insetion works well. The documents contains metadata (one of the fields is claim_id). I want to run a search but on a subset of documents by filtering on claim_id. Here is the code. but it doesn't seem to work. It always returns results without taking into account the filtering, also the k holds it's default value 4.


    search_type='similarity',
    search_kwargs={
        'k': 3,
        'filter': {"claim_id": 1}
    }
)
from langchain.chains import RetrievalQA
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever,
    verbose=True,
    return_source_documents=True,
)
query =  "what is prompt engineering?"

response = qa_stuff.invoke(query)
print(response) ```

### Error Message and Stack Trace (if applicable)

no error, but unexpected behavior 

### Description

I want to query on documents that have only claim_id=1 as metadata.
The returned result shows that the filtering does not work, it seems ignored

### System Info

ai21==2.6.0
ai21-tokenizer==0.10.0
aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.4.0
asttokens==2.4.1
attrs==23.2.0
azure-core==1.30.2
azure-cosmos==4.7.0
certifi==2024.6.2
charset-normalizer==3.3.2
colorama==0.4.6
comm==0.2.2
dataclasses-json==0.6.7
debugpy==1.8.1
decorator==5.1.1
distro==1.9.0
executing==2.0.1
filelock==3.15.1
frozenlist==1.4.1
fsspec==2024.6.0
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
huggingface-hub==0.23.4
idna==3.7
ipykernel==6.29.4
ipython==8.25.0
jedi==0.19.1
jsonpatch==1.33
jsonpointer==3.0.0
jupyter_client==8.6.2
jupyter_core==5.7.2
langchain==0.2.5
langchain-community==0.2.5
langchain-core==0.2.7
langchain-openai==0.1.8
langchain-text-splitters==0.2.1
langsmith==0.1.77
marshmallow==3.21.3
matplotlib-inline==0.1.7
multidict==6.0.5
mypy-extensions==1.0.0
nest-asyncio==1.6.0
numpy==1.26.4
openai==1.34.0
orjson==3.10.5
packaging==24.1
parso==0.8.4
platformdirs==4.2.2
prompt_toolkit==3.0.47
psutil==5.9.8
pure-eval==0.2.2
pydantic==2.7.4
pydantic_core==2.18.4
Pygments==2.18.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pywin32==306
PyYAML==6.0.1
pyzmq==26.0.3
regex==2024.5.15
requests==2.32.3
sentencepiece==0.2.0
six==1.16.0
sniffio==1.3.1
SQLAlchemy==2.0.30
stack-data==0.6.3
tenacity==8.4.1
tiktoken==0.7.0
tokenizers==0.19.1
tornado==6.4.1
tqdm==4.66.4
traitlets==5.14.3
typing-inspect==0.9.0
typing_extensions==4.12.2
urllib3==2.2.2
wcwidth==0.2.13
yarl==1.9.4
JulienFdez commented 1 week ago

Had the same issue, replace 'filter' in kwargs with 'filters' (fixed it for me). There is an issue with the documentation.