This occurs because the _encode() method is using sentences as the parameter name when calling self.model.encode(), but the underlying BGEM3FlagModel expects the parameter to be named queries.
Solution
Changed the parameter name in the _encode() method from sentences to queries to match the expected parameter name of the underlying model:
# Before
output = self.model.encode(sentences=texts, **self._encode_config)
# After
output = self.model.encode(queries=texts, **self._encode_config)
Testing
Tested the fix by creating an instance of BGEM3EmbeddingFunction and successfully encoding both documents and queries
Verified that all return types (dense, sparse, colbert_vecs) work as expected
Confirmed the error no longer occurs
Created a demonstration notebook showing the working implementation: Google Colab Notebook
Additional Context
This fix aligns with the FlagEmbedding implementation which uses queries as the parameter name in its encode method.
Related Issues
Resolves the error reported in various contexts where users attempt to use the BGE-M3 model with Milvus.
Fix: BGEM3EmbeddingFunction encode() parameter mismatch
Issue
When using the
BGEM3EmbeddingFunction
class, users encounter the following error:This occurs because the
_encode()
method is usingsentences
as the parameter name when callingself.model.encode()
, but the underlyingBGEM3FlagModel
expects the parameter to be namedqueries
.Solution
Changed the parameter name in the
_encode()
method fromsentences
toqueries
to match the expected parameter name of the underlying model:Testing
BGEM3EmbeddingFunction
and successfully encoding both documents and queriesAdditional Context
This fix aligns with the FlagEmbedding implementation which uses
queries
as the parameter name in its encode method.Related Issues
Resolves the error reported in various contexts where users attempt to use the BGE-M3 model with Milvus.