will triton kernels support cuda graph?

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

https://sglang.readthedocs.io/en/latest/

Apache License 2.0

5.68k stars 450 forks source link

Closed AlvL1225 closed 4 weeks ago

AlvL1225 commented 2 months ago

Only flash infer support cuda graph bug triton (with more cpu overhead) might gain more with cuda graph .

No response

merrymercy commented 4 weeks ago

AlvL1225 commented 4 weeks ago

supported in v0.3.1.post1 #1401

nice work! custom triton ops will reach better performance!