sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sgl-project.github.io/
Apache License 2.0
6.24k stars 540 forks source link

[Feature] Is AWQ W4Afp8 supported? #1964

Open vkc1vk opened 2 weeks ago

vkc1vk commented 2 weeks ago

Checklist

Motivation

AWQ with INT4 weights and fp8 activations / KV cache works fairly well with Llama-3 models, and is a useful quantization technique for high-throughput regime. Is this quantization format supported by SGLang?

Related resources

https://github.com/NVIDIA/TensorRT-LLM/blob/b7868dd1bd1186840e3755b97ea3d3a73ddd76c5/examples/falcon/README.md?plain=1#L311

zhyncs commented 2 weeks ago

Not yet.