microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.76k stars 163 forks source link

How can i use this library with langchain or llama_index? #450

Open risedangel opened 3 months ago

risedangel commented 3 months ago

Hello, I have a RAG application that i want to use with fastgen. Is it possible to achieve such thing? Or ıs there any way i can "serve" the model and lllama_index can query the model through api ?

risedangel commented 3 months ago

I got it working through running it eith openai model serve and https://docs.llamaindex.ai/en/v0.9.48/api_reference/llms/openai_like.html

regybean commented 1 month ago

@risedangel Could you share your implementation?