Closed NirantK closed 5 months ago
@NirantK Thanks for your PR, I am very happy to see it.
Started to support also HF-optimum - maybe you guys can learn something from the integration (how to embed concurrent request, onnx O4).
I am worried, it might not be that easy, as I split up tokenization, inference, and post-processing into three steps (to keep the device that does .forward()
, gpu, avx2, or mps busy)
Let me try fix that quickly
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
0ef322b
) 86.28% compared to head (15f44a7
) 86.49%.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks for the suggestion! Yes, we're working towards adding export flows e.g. optimum as well. We'll make it an optional extra like you've done as well in all likelihood.
Hello Michael!
Excellent work with Infinity!
This PR upgrades FastEmbed to the latest version which has a lot more models and some minor changes in the API to make it ready for sparse, image and other modalities.
This also upgrades Ruff, which gives speed improvements.