michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.eu/infinity/
MIT License
975 stars 72 forks source link

How to run or access infinity on hf a space? #161

Closed ffreemt closed 3 months ago

ffreemt commented 3 months ago

Hi. Thanks for the wonderful project.

Is it possible to directly deploy infinity on a hf space?

I guess it's possible to do it via gradio. But all I need is just embeddings. So I wonder whether I can simply run something like infinity_emb --model-name-or-path sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 --port 7860 in a hf space and access the API.

I tried to deply infinity on a hf space https://huggingface.co/spaces/mikeee/emb384. It seems to be running but I cannot figure out how to make a request to the API. There isn't anything at https://huggingface.co/spaces/mikeee/emb384/docs or https://huggingface.co/spaces/mikeee/emb384:7860/docs.

michaelfeil commented 3 months ago

Love the idea! Not sure how well you can expose a RestAPI on huggingface spaces. I would follow this Guide - effectivley you need to use Gradio and not FastAPI (my guess) https://www.tomsoderlund.com/ai/building-ai-powered-rest-api

I would default to the Python API (example below), then add a RestAPI later

import asyncio
from infinity_emb import AsyncEmbeddingEngine, EngineArgs

engine = AsyncEmbeddingEngine.from_args(EngineArgs(model_name_or_path = "BAAI/bge-small-en-v1.5", engine="torch"))

async def main(sentences = ("Embed this is sentence via Infinity.", "Paris is in France.")): 
    async with engine: # engine starts with engine.astart()
        embeddings, usage = await engine.embed(sentences=sentences)
    # engine stops with engine.astop()

# call the function from any async func or from asyncio.run()
asyncio.run(main())