Closed TheodoreGalanos closed 1 day ago
+1
Would it be similar to how vLLM is used as backend for Triton?
I would think so yeah. At least that's the one I was looking at although I don't really know how to do it :)
It would be awesome if support for triton inference server backend was added ...
Added one minimal example to serve sglang with triton inference server in this pull request: https://github.com/sgl-project/sglang/pull/242
Added one minimal example to serve sglang with triton inference server in this pull request: #242
@amirarsalan90 @TheodoreGalanos Interesting work. And currently this implementation is not SOTA. Previously, my colleague and I supported the Triton Python Backend on LMDeploy. Its performance is comparable to, or even slightly better than the API Server. For more details, you can refer to https://github.com/InternLM/lmdeploy/pull/1329. I'm not sure if you're interested in implementing it in SGLang. Thanks.
@zhyncs Thanks for the suggestion. Unfortunately I'm currently busy and don't have much time to work on this.
I'll support in this week with Triton Server ensemble mode.
In the past month or so, I haven't had much time to address this issue. Meanwhile, some of my thoughts have changed. Firstly, SGLang currently provides an engine API, making it very easy to integrate Triton Server. However, at the same time, introducing Triton Server is not that necessary unless the company's existing deployment platform is developed based on Triton Server. Therefore, I don't highly recommend it. I will temporarily close this issue for now. If there are others in the community with a high demand for Triton Server later on, I will reconsider this matter. Thank you. BTW Previously, @ispobock and I did similar work on LMDeploy. Those interested can refer to https://github.com/InternLM/lmdeploy/tree/main/lmdeploy/serve/turbomind/triton_python_backend
Hello, curious if we can already use sglang as a backend for NVIDIA's Triton Server.
Amazing work with the library btw, love it!