michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.github.io/infinity/
MIT License
1.06k stars 75 forks source link

Update dependencies #96

Closed NirantK closed 5 months ago

NirantK commented 5 months ago

Hello Michael!

Excellent work with Infinity!

This PR upgrades FastEmbed to the latest version which has a lot more models and some minor changes in the API to make it ready for sparse, image and other modalities.

This also upgrades Ruff, which gives speed improvements.

michaelfeil commented 5 months ago

@NirantK Thanks for your PR, I am very happy to see it.

Started to support also HF-optimum - maybe you guys can learn something from the integration (how to embed concurrent request, onnx O4). I am worried, it might not be that easy, as I split up tokenization, inference, and post-processing into three steps (to keep the device that does .forward(), gpu, avx2, or mps busy)

Let me try fix that quickly

codecov-commenter commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (0ef322b) 86.28% compared to head (15f44a7) 86.49%.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #96 +/- ## ========================================== + Coverage 86.28% 86.49% +0.20% ========================================== Files 27 27 Lines 1269 1266 -3 ========================================== Hits 1095 1095 + Misses 174 171 -3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

NirantK commented 5 months ago

Thanks for the suggestion! Yes, we're working towards adding export flows e.g. optimum as well. We'll make it an optional extra like you've done as well in all likelihood.