Open dudeperf3ct opened 3 months ago
Hey @dudeperf3ct! I'm here to help you with any bugs, questions, or contributions you have related to this repository. Let me know how I can assist you!
The error ParamError: (code=1, message=invalid input for sparse float vector)
is likely due to the sparse embedding field not being correctly configured or the sparse embedding function not being properly provided.
To fix this, ensure that the enable_sparse
flag is set to True
and that a valid sparse_embedding_function
is provided. Additionally, make sure that the collection schema includes the sparse_embedding_field
with the correct data type (SPARSE_FLOAT_VECTOR
).
Here's an example of how to correctly set up the MilvusVectorStore
with sparse embeddings:
from llama_index.vector_stores.milvus import MilvusVectorStore
from FlagEmbedding import BGEM3FlagModel
from typing import List
from llama_index.vector_stores.milvus.utils import BaseSparseEmbeddingFunction
class ExampleEmbeddingFunction(BaseSparseEmbeddingFunction):
def __init__(self):
self.model = BGEM3FlagModel("BAAI/bge-m3", use_fp16=False)
def encode_queries(self, queries: List[str]):
outputs = self.model.encode(
queries,
return_dense=False,
return_sparse=True,
return_colbert_vecs=False,
)["lexical_weights"]
return [self._to_standard_dict(output) for output in outputs]
def encode_documents(self, documents: List[str]):
outputs = self.model.encode(
documents,
return_dense=False,
return_sparse=True,
return_colbert_vecs=False,
)["lexical_weights"]
return [self._to_standard_dict(output) for output in outputs]
def _to_standard_dict(self, raw_output):
result = {}
for k in raw_output:
result[int(k)] = raw_output[k]
return result
# Setup MilvusVectorStore with sparse embeddings enabled
vector_store = MilvusVectorStore(
dim=1536,
collection_name="your_collection_name",
uri="http://milvus_address:port",
token="your_milvus_token_here",
overwrite=True,
enable_sparse=True,
sparse_embedding_function=ExampleEmbeddingFunction(), # Provide your sparse embedding function
hybrid_ranker="WeightedRanker", # or "RRFRanker"
hybrid_ranker_params={"weights": [1.0, 1.0]} # or {"k": 60} for RRFRanker
)
Ensure that the ExampleEmbeddingFunction
class implements the necessary methods to convert text to a sparse embedding. This setup should resolve the ParamError
and allow the hybrid search to function correctly [1][2].
To continue talking to Dosu, mention @dosu.
same. I solved this by not using the splitter
function.
milvus_simple_vector_index = VectorStoreIndex.from_documents(
documents=[Document(text=source_docx_text, metadata={"file_name": docx_client.docx_file_path})], storage_context=storage_context)
But I wonder why it fails with nodes or documents(splitted text).
I am experiencing the same with splitted documents, but using langchain not llama_index. therefore this is defo issue with the Milvus native SDK. Did you find a work-around for chunked documents? the fulll documents are just a bit too long I am afraid.
Bug Description
The hybrid search for Milvus vector store is not working.
Version
0.10.58
Steps to Reproduce
Here's the following code that I am using
Data ingestion
Dense milvus vector store
The above code works correctly and I can retrieve the closest documents to the query.
Hybrid/Sparse milvus vector store
This fails with the following error
I tested the example code for Milvus hybrid vector search, it works for that dataset.
Not sure why the default sparse embedding function is not working.
Relevant Logs/Tracbacks
No response