qdrant / fastembed

Fast, Accurate, Lightweight Python library to make State of the Art Embedding
https://qdrant.github.io/fastembed/
Apache License 2.0
1.36k stars 99 forks source link

Difference between passage_embed and embed in Embedding Class #56

Closed TheSeriousProgrammer closed 10 months ago

TheSeriousProgrammer commented 10 months ago

I see that the code base has 2 methods one passage_embed and embed, but upon inspection of the code, I think that both are essentially the same, is there any difference between them. Or is it intended to add future features

NirantK commented 10 months ago

The passage_embed function adds the prefix "passage:". This is recommended usage when you're indexing a corpora into a vector store for instance.

This is expected by several embedding model implementations trained with contrastive losses. For instance, prefix is recommended by the BGE set of embedding models. BGE is default embedding we use as well in fastembed.

Similarly, query_embed function adds the "query:" prefix for usage at query time, when you want to retrieve or rank something using vector similarity

I hope this answers your question 🤞🏼

If it does not, please do re-open the issue!