sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sgl-project.github.io/
Apache License 2.0
5.87k stars 476 forks source link

Triton support #35

Closed TheodoreGalanos closed 1 day ago

TheodoreGalanos commented 9 months ago

Hello, curious if we can already use sglang as a backend for NVIDIA's Triton Server.

Amazing work with the library btw, love it!

isaac-vidas commented 9 months ago

+1

Would it be similar to how vLLM is used as backend for Triton?

From this tutorial on vLLM on Triton

TheodoreGalanos commented 9 months ago

I would think so yeah. At least that's the one I was looking at although I don't really know how to do it :)

amirarsalan90 commented 8 months ago

It would be awesome if support for triton inference server backend was added ...

amirarsalan90 commented 8 months ago

Added one minimal example to serve sglang with triton inference server in this pull request: https://github.com/sgl-project/sglang/pull/242

zhyncs commented 3 months ago

Added one minimal example to serve sglang with triton inference server in this pull request: #242

@amirarsalan90 @TheodoreGalanos Interesting work. And currently this implementation is not SOTA. Previously, my colleague and I supported the Triton Python Backend on LMDeploy. Its performance is comparable to, or even slightly better than the API Server. For more details, you can refer to https://github.com/InternLM/lmdeploy/pull/1329. I'm not sure if you're interested in implementing it in SGLang. Thanks.

amirarsalan90 commented 3 months ago

@zhyncs Thanks for the suggestion. Unfortunately I'm currently busy and don't have much time to work on this.

zhyncs commented 1 month ago

I'll support in this week with Triton Server ensemble mode.

zhyncs commented 1 day ago

In the past month or so, I haven't had much time to address this issue. Meanwhile, some of my thoughts have changed. Firstly, SGLang currently provides an engine API, making it very easy to integrate Triton Server. However, at the same time, introducing Triton Server is not that necessary unless the company's existing deployment platform is developed based on Triton Server. Therefore, I don't highly recommend it. I will temporarily close this issue for now. If there are others in the community with a high demand for Triton Server later on, I will reconsider this matter. Thank you. BTW Previously, @ispobock and I did similar work on LMDeploy. Those interested can refer to https://github.com/InternLM/lmdeploy/tree/main/lmdeploy/serve/turbomind/triton_python_backend