prrao87 / db-hub-fastapi

Async bulk data ingestion and querying in various document, graph and vector databases via their Python clients
MIT License
33 stars 3 forks source link

Qdrant refactor #20

Closed prrao87 closed 1 year ago

prrao87 commented 1 year ago

Purpose of this PR

This PR refactors the Qdrant code base to offer better performance and a structure that allows the user to decide the course of operation, depending on the available hardware and Python version.

prrao87 commented 1 year ago

Notes on ONNX performance

It looks like ONNX does utilize all available CPU cores when processing the text and generating the embeddings (the image below was generated from an AWS EC2 T2 ubuntu instance with a single 4-core CPU).

image

On average, the entire wine reviews dataset of 129,971 reviews is vectorized and ingested into Qdrant in 34 minutes via the quantized ONNX model, as opposed to more than 1 hour for the regular sbert model downloaded from the sentence-transformers repo. The quantized ONNX model is also ~33% smaller in size from the original model.

This amounts to a roughly 1.8x reduction in indexing time, with a ~26% smaller (quantized) model that loads and processes results faster. To verify that the embeddings from the quantized models are of similar quality, some example cosine similarities are shown below.

Example results:

The following results are for the sentence-transformers/multi-qa-MiniLM-L6-cos-v1 model that was built for semantic similarity tasks.

Vanilla model

---
Loading vanilla sentence transformer model
---
Similarity between 'I'm very happy' and 'I am so glad': [0.74601071]
Similarity between 'I'm very happy' and 'I'm so sad': [0.6456476]
Similarity between 'I'm very happy' and 'My dog is missing': [0.09541589]
Similarity between 'I'm very happy' and 'The universe is so vast!': [0.27607652]

Quantized ONNX model

---
Loading quantized ONNX model
---
The ONNX file model_optimized_quantized.onnx is not a regular name used in optimum.onnxruntime, the ORTModel might not behave as expected.
Similarity between 'I'm very happy' and 'I am so glad': [0.74153285]
Similarity between 'I'm very happy' and 'I'm so sad': [0.65299551]
Similarity between 'I'm very happy' and 'My dog is missing': [0.09312761]
Similarity between 'I'm very happy' and 'The universe is so vast!': [0.26112114]

As can be seen, the similarity scores are very close to the vanilla model, but the model is ~26% smaller and we are able to process the sentences much faster on the same CPU.