qdrant / fastembed

Fast, Accurate, Lightweight Python library to make State of the Art Embedding
https://qdrant.github.io/fastembed/
Apache License 2.0
1.38k stars 99 forks source link

Add model sentencetransformers/multi-qa-MiniLM-L6-cos-v1 #41

Open msj121 opened 11 months ago

msj121 commented 11 months ago

Hi is there a way to support "sentencetransformers/multi-qa-MiniLM-L6-cos-v1". It is similar to sentencetransformers/AllMiniLML6V2.

Hopefully nothing changes to produce this version, just it was trained on questions and answers and might be more relevant for databases when I am trying to match specific queries which are submitted as questions?

NirantK commented 10 months ago

Will consider adding the model. Thanks for the suggestion!

Since you're interested in question-answer aware embedding: The default embedding is also aware of the question and answer split using prefixes. Here is how'd you that:

Add "query:" for questions:

"query: Who is Maharaja Shivaji?"

Add "passage:" to the documents that you index

"passage: This is an example passage."

This is also the behaviour in Qdrant's integration with FastEmbed. This model does much better on Retrieval than the multi-qa model to the best of my knowledge. Can check the latest on MTEB