Closed bwanglzu closed 2 months ago
Is there any doc about jina v3 embedding? what kind of tasks_type it can use?
hi @wxywb yes i'm working on another PR to milvus-docs with detailed documentation, the tasks are:
+ **retrieval.query**: Used to encode user queries or questions in retrieval tasks.
+ **retrieval.passage**: Used to encode large documents in retrieval tasks at indexing time.
+ **classification**: Used to encode text for text classification tasks.
+ **text-matching**: Used to encode text for similarity matching, such as measuring similarity between two sentences.
+ **separation**: Used for clustering or reranking tasks.
i need to figure out a good way for this PR, now my problem is, we have a encode_query
and encode_document
function built-in each embedding function, such as JinaEmbeddingFunction
, i guess in these two functions, i need to use task_type=query and task_type=passage respectively.
While for other tasks, i'm not sure what is the best way to parse this task_type
, so i overwrite the __call__
function (not sure if it is desired).
Can you give me some suggestions?
Bo
(btw if you are in the jina_ai-x-milvus
joint slack channel, i can give you more detailed information :)
The current design puts task_type
in the init method, and the call method will use this setting. If someone needs to change it, they can create a new embedding function.
You can see an example in cohere.
https://github.com/milvus-io/milvus-model/blob/main/milvus_model/dense/cohere.py
This PR should allow
pymilvus
support `jina-embeddings-v3:with 2 additional fields: