Hello :)
With this wonderful performance project, we tested and got tremendous improvement on our model inference service.
One thing that I wanna know is, Deepspeed-fastgen has any plans to be improved as production-ready serving framework?
For instance, I think some features have to meet expectations.
For Restful API server, openai-compatible API should be provided. vLLM or TGI already supports it.
Observability and logging feature has to be made on framework. Metrics has to be gathered on serving.
For easy deployment, docker images can be made on this proejct. Also, API should provide checking capability for API healthcheck, readiness/liveliness probe.
Maybe, some other features can be added. If you have in mind to improve this project, some contribution can be made but roadmap has to be made.
Hello :) With this wonderful performance project, we tested and got tremendous improvement on our model inference service.
One thing that I wanna know is, Deepspeed-fastgen has any plans to be improved as production-ready serving framework?
For instance, I think some features have to meet expectations.
Maybe, some other features can be added. If you have in mind to improve this project, some contribution can be made but roadmap has to be made.
I want to hear opinion of you. Thank you.