Closed Q-Ot closed 1 month ago
Hello @Q-Ot! I'm here to help with any bugs, questions, or contributions you have for the Dify repository. Let me know how I can assist you!
Milvus and Zilliz do not support full-text search. This is indicated by the comments in the search_by_full_text
method:
def search_by_full_text(self, query: str, **kwargs: Any) -> list[Document]:
# milvus/zilliz doesn't support bm25 search
return []
Additionally, Milvus does not support full-text searching in versions below 2.3.x:
def search_by_full_text(self):
# milvus does not support full text searching yet in < 2.3.x
hits_by_full_text = self.vector.search_by_full_text(query=get_example_text())
assert len(hits_by_full_text) == 0
Weaviate does support full-text search using BM25F, as shown in the search_by_full_text
method:
def search_by_full_text(self, query: str, **kwargs: Any) -> list[Document]:
"""Return docs using BM25F.
Args:
query: Text to look up documents similar to.
k: Number of Documents to return. Defaults to 4.
Returns:
List of Documents most similar to the query.
"""
collection_name = self._collection_name
content: dict[str, Any] = {"concepts": [query]}
properties = self._attributes
properties.append(Field.TEXT_KEY.value)
if kwargs.get("search_distance"):
content["certainty"] = kwargs.get("search_distance")
query_obj = self._client.query.get(collection_name, properties)
if kwargs.get("where_filter"):
query_obj = query_obj.with_where(kwargs.get("where_filter"))
if kwargs.get("additional"):
query_obj = query_obj.with_additional(kwargs.get("additional"))
properties = ['text']
result = query_obj.with_bm25(query=query, properties=properties).with_limit(kwargs.get('top_k', 2)).do()
if "errors" in result:
raise ValueError(f"Error during query: {result['errors']}")
docs = []
for res in result["data"]["Get"][collection_name]:
text = res.pop(Field.TEXT_KEY.value)
docs.append(Document(page_content=text, metadata=res))
return docs
Qdrant supports full-text search, implemented in Dify using the search_by_full_text
method. This method utilizes the Qdrant client's filtering capabilities to match text within the page_content
field of documents:
def search_by_full_text(self, query: str, **kwargs: Any) -> list[Document]:
"""Return docs most similar by bm25.
Returns:
List of documents most similar to the query text and distance for each.
"""
from qdrant_client.http import models
scroll_filter = models.Filter(
must=[
models.FieldCondition(
key="group_id",
match=models.MatchValue(value=self._group_id),
),
models.FieldCondition(
key="page_content",
match=models.MatchText(text=query),
)
]
)
response = self._client.scroll(
collection_name=self._collection_name,
scroll_filter=scroll_filter,
limit=kwargs.get('top_k', 2),
with_payload=True,
with_vectors=True
)
results = response[0]
documents = []
for result in results:
if result:
documents.append(self._document_from_scored_point(
result, Field.CONTENT_KEY.value, Field.METADATA_KEY.value
))
return documents
MyScale supports full-text search, implemented as one of the retrieval options in high-quality indexing mode. The full-text search indexes all terms in the document, allowing users to query any terms and return text fragments containing those terms [1].
TiDB_Vector supports full-text search, implemented in Dify through the search_by_full_text
method in the TencentVector
class:
def search_by_full_text(self):
hits_by_full_text = self.vector.search_by_full_text(query=get_example_text())
assert len(hits_by_full_text) == 0
In summary:
To continue talking to Dosu, mention @dosu.
Qdrant does support full-text search.
Try search search_by_full_text
in code repo.
@dosu @crazywoola Thank you for your response. I've search search_by_full_text in code repo. May I assume that only Qdrant, MyScale, OpenSearch, Analyticdb_vector,oraclevector, Weaviate support full-text search? Additionally, in Weaviate, I've done some tests, the search_by_full_text method always returns an empty list.
@dosu @crazywoola Thank you for your response. I've search search_by_full_text in code repo. May I assume that only Qdrant, MyScale, OpenSearch, Analyticdb_vector,oraclevector, Weaviate support full-text search? Additionally, in Weaviate, I've done some tests, the search_by_full_text method always returns an empty list.
+1 The returned list is empty even when I copied a sentence as the query from the doc in the corresponding dataset.
Self Checks
1. Is this request related to a challenge you're experiencing? Tell me about your story.
Hello, I am currently using Weaviate as a vector database and have encountered an issue where the full-text search portion consistently returns an empty list when performing full-text or hybrid searches. In this issue #6508 regarding full-text search, I noticed that contributors to dify pointed out that Weaviate does not support full-text search and recommended switching to Qdrant.
In the Dify Chinese documentation, I found that Dify supports vector databases such as Weaviate, Qdrant, Milvus, Zilliz, MyScale, and TiDB_Vector (btw, TiDB_Vector is not mentioned in the English documentation). I would like to know which of those above-mentioned vector databases support full-text search. Additionally, is the lack of full-text search support in Weaviate due to the vdb itself not providing such functionality?
2. Additional context or comments
No response
3. Can you help us with this feature?