microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.9k stars 175 forks source link

What is the recommended way of bringing up mii as a service #318

Open flexwang opened 12 months ago

flexwang commented 12 months ago

My understanding is that we have to build a fastAPI wrapper, and during intialized phase we call client = mii.client("mistralai/Mistral-7B-v0.1") and we implement a handler to call client.generate.

PawanOsman commented 11 months ago

You can use the RESTFul API

Also, I added a PR #317 (still in progress) to implement an OpenAI-compatible RESTful API