sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sglang.readthedocs.io/en/latest/
Apache License 2.0
5.68k stars 450 forks source link

will triton kernels support cuda graph? #1097

Closed AlvL1225 closed 4 weeks ago

AlvL1225 commented 2 months ago

Motivation

Only flash infer support cuda graph bug triton (with more cpu overhead) might gain more with cuda graph .

Related resources

No response

merrymercy commented 4 weeks ago

supported in v0.3.1.post1 https://github.com/sgl-project/sglang/pull/1401

AlvL1225 commented 4 weeks ago

supported in v0.3.1.post1 #1401

nice work! custom triton ops will reach better performance!