Qdrant: A vector database built on Rust

Goals

The aim is to bulk index both the data and associated vectors (sentence embeddings) using sentence-transformers into Qdrant, a vector database so that we can perform similarity search on phrases.

Unlike keyword based search, similarity search requires vectors that come from an NLP (typically transformer) model
- We use a pretrained model from sentence-transformers
- multi-qa-distilbert-cos-v1 is the model used: As per the docs, "This model was tuned for semantic search: Given a query/question, it can find relevant passages. It was trained on a large and diverse set of (question, answer) pairs."
Unlike other cases, generating sentence embeddings on a large batch of text is quite slow on a CPU, so the aim is to explore how to generate ONNX-optimized models so that we both generate and index the vectors into db more rapidly without a GPU

prrao87 / db-hub-fastapi

Qdrant: A vector database built on Rust #17

Goals