Closed morrizon closed 1 month ago
Hey @morrizon, I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!
The issue you're encountering with the Qdrant Vector Store not using advanced configuration fields when ingesting documents is due to the Qdrant.from_documents
method not correctly passing these parameters to the QdrantClient
. To fix this, you need to modify the build_vector_store
method in the QdrantVectorStoreComponent
class to ensure that all relevant parameters are passed.
Here's the corrected code:
class QdrantVectorStoreComponent(LCVectorStoreComponent):
display_name = "Qdrant"
description = "Qdrant Vector Store with search capabilities"
documentation = "https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/qdrant"
icon = "Qdrant"
inputs = [
StrInput(name="collection_name", display_name="Collection Name", required=True),
StrInput(name="host", display_name="Host", value="localhost", advanced=True),
IntInput(name="port", display_name="Port", value=6333, advanced=True),
IntInput(name="grpc_port", display_name="gRPC Port", value=6334, advanced=True),
SecretStrInput(name="api_key", display_name="API Key", advanced=True),
StrInput(name="prefix", display_name="Prefix", advanced=True),
IntInput(name="timeout", display_name="Timeout", advanced=True),
StrInput(name="path", display_name="Path", advanced=True),
StrInput(name="url", display_name="URL", advanced=True),
DropdownInput(
name="distance_func",
display_name="Distance Function",
options=["Cosine", "Euclidean", "Dot Product"],
value="Cosine",
advanced=True,
),
StrInput(name="content_payload_key", display_name="Content Payload Key", value="page_content", advanced=True),
StrInput(name="metadata_payload_key", display_name="Metadata Payload Key", value="metadata", advanced=True),
MultilineInput(name="search_query", display_name="Search Query"),
DataInput(
name="ingest_data",
display_name="Ingest Data",
is_list=True,
),
HandleInput(name="embedding", display_name="Embedding", input_types=["Embeddings"]),
IntInput(
name="number_of_results",
display_name="Number of Results",
info="Number of results to return.",
value=4,
advanced=True,
),
]
@check_cached_vector_store
def build_vector_store(self) -> Qdrant:
qdrant_kwargs = {
"collection_name": self.collection_name,
"content_payload_key": self.content_payload_key,
"metadata_payload_key": self.metadata_payload_key,
}
server_kwargs = {
"host": self.host if self.host else None,
"port": int(self.port), # Ensure port is an integer
"grpc_port": int(self.grpc_port), # Ensure grpc_port is an integer
"api_key": self.api_key,
"prefix": self.prefix,
"timeout": int(self.timeout) if self.timeout else None, # Ensure timeout is an integer
"path": self.path if self.path else None,
"url": self.url if self.url else None,
}
server_kwargs = {k: v for k, v in server_kwargs.items() if v is not None}
documents = []
for _input in self.ingest_data or []:
if isinstance(_input, Data):
documents.append(_input.to_lc_document())
else:
documents.append(_input)
if not isinstance(self.embedding, Embeddings):
raise ValueError("Invalid embedding object")
if documents:
qdrant = Qdrant.from_documents(documents, embedding=self.embedding, **qdrant_kwargs, **server_kwargs)
else:
from qdrant_client import QdrantClient
client = QdrantClient(**server_kwargs)
qdrant = Qdrant(embeddings=self.embedding, client=client, **qdrant_kwargs)
return qdrant
def search_documents(self) -> List[Data]:
vector_store = self.build_vector_store()
if self.search_query and isinstance(self.search_query, str) and self.search_query.strip():
docs = vector_store.similarity_search(
query=self.search_query,
k=self.number_of_results,
)
data = docs_to_data(docs)
self.status = data
return data
else:
return []
This modification ensures that the url
, host
, and api_key
fields, among others, are correctly passed to the QdrantClient
, allowing the advanced configuration fields to be used when ingesting documents [1][2].
To continue talking to Dosu, mention @dosu.
Bug Description
Qdrant Vector Store is not using the configuration fields when ingesting documents so it will only work for the default case (qdrant service running in localhost in port 6333). Trying to use a different
host
or usingurl
andAPI Key
will not change the behaviour.Reproduction
The test was performed in docker and we tried 2 different Qdrant configurations:
In both cases the component was giving error #99[2]
After debugging we saw that the error was triggered in the line 92 of the Qdrant implementation: https://github.com/langflow-ai/langflow/blob/96ca71dab855639f82492c225f044d1a212bcdaa/src/backend/base/langflow/components/vectorstores/Qdrant.py#L92
The configuration fields are not part of the argument that are only used in the case of no documents[3].
To fix the issue we modified the previous line:
After the change it worked like a charm in both cases (using other host or using url/apikey).
[1] simplified docker compose. Mind that we didn't add the env variables, volumes or network. In our test we used Traefik as balancer with Let's Encrypt certificates.
[2] Qdrant error
[3] the fields are used in the variable
server_kwargs
that is only used in the case of no documentsExpected behavior
Qdrant should use the advanced fields also when ingesting documents. A possible solution was commented in the reproduction.
Who can help?
I mention @nicoloboschi because was the last person working specifically in this file. If I saw there is not activity, I will check to do the fix myself in the future
Operating System
Debian 12.5 (bookworm)
Langflow Version
1.0.17
Python Version
3.12
Screenshot
No response
Flow File
No response