Closed AlvL1225 closed 4 weeks ago
Only flash infer support cuda graph bug triton (with more cpu overhead) might gain more with cuda graph .
No response
supported in v0.3.1.post1 https://github.com/sgl-project/sglang/pull/1401
supported in v0.3.1.post1 #1401
nice work! custom triton ops will reach better performance!
Motivation
Only flash infer support cuda graph bug triton (with more cpu overhead) might gain more with cuda graph .
Related resources
No response