prrao87 / db-hub-fastapi

Async bulk data ingestion and querying in various document, graph and vector databases via their Python clients
MIT License
33 stars 3 forks source link

Improve docs and remove ONNX #39

Closed prrao87 closed 1 year ago

prrao87 commented 1 year ago

ONNX models are cumbersome, hard to maintain, and the APIs for them via Hugging Face optimum and transformers keep changing. Not to mention, ONNX hasn't been supported on Python 3.11 for a while now, so it's not something that makes sense to rely on long term. In most cases, it anyway becomes necessary to run sbert on multiple GPUs -- these can be scaled up much more effectively via Ray in a production scenario, so it's better to focus efforts on that instead of using ONNX and quantization for speedups.