michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip
https://michaelfeil.github.io/infinity/
MIT License
1.24k stars 85 forks source link

Add Installation Option to Depend Only on ONNX, Excluding New Torch and CUDA Packages #332

Open bash99 opened 1 month ago

bash99 commented 1 month ago

Feature request

An installation option that allows users to install the project with dependencies limited to ONNX only, excluding newer versions of Torch and CUDA, particularly CUDA 12. This option would enable users to run the project in environments with older CUDA versions (e.g., 11.7), pure CPU setups, or other ONNX-compatible platforms like ONNX-DirectML.

There are some polular enbedding and reranker model has onnx binary like youdao/bce-embedding, and jina-embeddings-v2-base-zh.

Motivation

Current installation requirements can create barriers for users with legacy hardware or specific configurations. By providing an option to install without the latest Torch and CUDA dependencies, we can enhance accessibility and flexibility, allowing more users to effectively utilize the project in a broader range of environments.

Your contribution

Help test or fix few simple bugs.

michaelfeil commented 1 month ago

@bash99 I think this issue blocks currently: https://github.com/huggingface/optimum/issues/526

You could make it pytorch+cpu compatible. I am not sure if I want to maintain a legacy version of CUDA, I don't think I have the capacity to develop for that & maintain.

talw-nym commented 1 month ago

@michaelfeil I agree that the issue you mentioned is a blocker - however, by removing the dependency on torch you will enable people to choose what framework to use without forcing us to install torch and its specific dependencies. This is really important from a production standpoint and community offering - separating learning from inferencing, and keeping deployments small. This costs a lot of money, and time and runtime and startup time performance.

michaelfeil commented 1 month ago

@talw-nym I hear you, but your point is not clear. Optimum requires torch to be installed, and infinity depends on optimum for onnx inference -> please solve the issue in optimum & I'll pull it in here