langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
89.33k stars 14.08k forks source link

`WeaviateHybridSearchRetriever` isn't working with weaviate cliient v4 #21147

Closed elieobeid7 closed 2 months ago

elieobeid7 commented 2 months ago

Checked other resources

Example Code

import weaviate

from langchain_community.retrievers import (
    WeaviateHybridSearchRetriever,
)
from langchain_core.documents import Document

from config import OPENAI_API_KEY, WEAVIATE_HOST, WEAVIATE_PORT

headers = {
    "X-Openai-Api-Key": OPENAI_API_KEY,
}

client = weaviate.connect_to_local(headers=headers)

retriever = WeaviateHybridSearchRetriever(
    client=client,
    index_name="LangChain",
    text_key="text",
    attributes=[],
    create_schema_if_missing=True,
)
docs = [
    Document(
        metadata={
            "title": "Embracing The Future: AI Unveiled",
            "author": "Dr. Rebecca Simmons",
        },
        page_content="A comprehensive analysis of the evolution of artificial intelligence, from its inception to its future prospects. Dr. Simmons covers ethical considerations, potentials, and threats posed by AI.",
    )
]

retriever.add_documents(docs)

answer = retriever.invoke("the ethical implications of AI")
print(answer)

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
  File "main.py", line 17, in <module>
    retriever = WeaviateHybridSearchRetriever(
  File "venv\lib\site-packages\langchain_core\load\serializable.py", line 120, in __init__
    super().__init__(**kwargs)
  File "venv\lib\site-packages\pydantic\v1\main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for WeaviateHybridSearchRetriever
__root__
  client should be an instance of weaviate.Client, got <class 'weaviate.client.WeaviateClient'> (type=value_error)
sys:1: ResourceWarning: unclosed <socket.socket fd=880, family=AddressFamily.AF_INET6, type=SocketKind.SOCK_STREAM, proto=0, laddr=('::1', 64509, 0, 0), raddr=('::1', 8080, 0, 0)>

Description

windows 11, I'm trying to use WeaviateHybridSearchRetriever with Weaviate client v4 since v3 is deprecated.

System Info

aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.3.0
async-timeout==4.0.3
attrs==23.2.0
Authlib==1.3.0
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
cryptography==42.0.5
dataclasses-json==0.6.5
exceptiongroup==1.2.1
frozenlist==1.4.1
greenlet==3.0.3
grpcio==1.63.0
grpcio-health-checking==1.63.0
grpcio-tools==1.63.0
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
idna==3.7
jsonpatch==1.33
jsonpointer==2.4
langchain==0.1.17
langchain-community==0.0.36
langchain-core==0.1.48
langchain-text-splitters==0.0.1
langsmith==0.1.52
marshmallow==3.21.1
multidict==6.0.5
mypy-extensions==1.0.0
numpy==1.26.4
orjson==3.10.2
packaging==23.2
protobuf==5.26.1
pycparser==2.22
pydantic==2.7.1
pydantic_core==2.18.2
PyYAML==6.0.1
requests==2.31.0
sniffio==1.3.1
SQLAlchemy==2.0.29
tenacity==8.2.3
typing-inspect==0.9.0
typing_extensions==4.11.0
urllib3==2.2.1
validators==0.28.1
weaviate-client==4.5.7
yarl==1.9.4
Sachin-Bhat commented 2 months ago

Hey @elieobeid7 maybe you should try the python package langchain-weaviate. This may help solve the issue.

elieobeid7 commented 2 months ago

@Sachin-Bhat I just inspected the source code of that package https://github.com/langchain-ai/langchain-weaviate/tree/main/libs/weaviate

So can't use it, I'd rather stick with weaviate v3 client and follow the official docs rather than waste time trying to understand how it works. In any case it doesn't even have hybrid search as I previously said.

StreetLamb commented 2 months ago

Hi @elieobeid7, I took a look at the source code and there's a reference to hybrid search here. The langchain docs also states that similarity_search uses Weaviate hybrid search as can be seen here. Hope this helps.

hsm207 commented 2 months ago

Hi @elieobeid7,

I'm the maintainer of the langchain-weaviate integration and can confirm what @StreetLamb said.

Hybrid search is supported in v4, just not through the WeaviateHybridSearchRetriever class.

It has been consolidated into the similarity_search function. By default, it does 50:50 bm25 and vector search. Users can pass the arg alpha to it such that 0 means pure BM25 search, and 1 means pure vector search.

nick-youngblut commented 1 week ago

It has been consolidated into the similarity_search function. By default, it does 50:50 bm25 and vector search. Users can pass the arg alpha to it such that 0 means pure BM25 search, and 1 means pure vector search.

@hsm207

It would be helpful if that was made clear in the Weaviate Hybrid Search docs.

Also, I don't see By default, it does 50:50 bm25 and vector search in the similarity_search function docs:

        """Return docs most similar to query.

        Args:
            query: Text to look up documents similar to.
            k: Number of Documents to return. Defaults to 4.
            **kwargs: Additional keyword arguments will be passed to the `hybrid()`
                function of the weaviate client.

        Returns:
            List of Documents most similar to the query.