sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sglang.readthedocs.io/en/latest/
Apache License 2.0
5.1k stars 356 forks source link

[Feature] add option to use liger triton kernel #1216

Closed binarycrayon closed 1 week ago

binarycrayon commented 2 weeks ago

Checklist

Motivation

liger triton kernel is a one liner patch to huggingface models. It provides inference speed up and memory reduction.

Related resources

https://github.com/linkedin/Liger-Kernel

zhyncs commented 2 weeks ago

Whether a library can be integrated depends on whether there is a demand and whether the performance of this library has advantages. Are you willing to use PyTorch Benchmark to verify the performance of this library https://pytorch.org/tutorials/recipes/recipes/benchmark.html? I currently feel that it probably does not have any advantage compared to our existing implementation, and there are many similar libraries, such as https://github.com/AlibabaPAI/FLASHNN, https://github.com/FlagOpen/FlagGems. What is its advantage over other libraries? We will be very cautious about whether to introduce these unnecessary dependencies.

binarycrayon commented 2 weeks ago

@zhyncs got it, thanks for the response. I can look into pytorch benchmark of the said library and post them here. thank you!

Will benchmark on A100 80GB be sufficient enough?

zhyncs commented 2 weeks ago

It's better to benchmark both A100 and H100. Thanks!

ByronHsu commented 2 weeks ago

Liger Kernel team here! Currently Liger kernel is optimized for training (like linear+cross entropy layer to slash memory), so i believe other inference specific libs might do better than us.

binarycrayon commented 1 week ago

Hi thanks for the comment, that makes sense. I will close it for now. We can open new case if we want to revisit it in the future.