qdrant / fastembed

Fast, Accurate, Lightweight Python library to make State of the Art Embedding
https://qdrant.github.io/fastembed/
Apache License 2.0
1.55k stars 113 forks source link

Wrap and support Sparse Vector Creation #12

Closed NirantK closed 8 months ago

NirantK commented 1 year ago

FastEmbed should/can support sparse vector creation which is based on Bag of Words e.g. TF-IDF and BM25 Okapi. We can launch with existing Python implementations e.g https://pypi.org/project/rank-bm25/

This will help adoption for sparse vectors within the Qdrant ecosystem itself as we can recommend this as the canonical place to make some sparse vectors.

generall commented 1 year ago

Alternatively to BM25, we can consider running SLADE-like models. Last time I tried them, the inference speed was my top concern

NirantK commented 1 year ago

They continue to be slow, and I don't know of an obvious way to run them with onnxruntime yet. Will keep an eye out on SPLADE though

Anush008 commented 8 months ago

#149 New models to follow.