[x] 2. Please use English, otherwise it will be closed.
Motivation
AWQ with INT4 weights and fp8 activations / KV cache works fairly well with Llama-3 models, and is a useful quantization technique for high-throughput regime. Is this quantization format supported by SGLang?
Checklist
Motivation
AWQ with INT4 weights and fp8 activations / KV cache works fairly well with Llama-3 models, and is a useful quantization technique for high-throughput regime. Is this quantization format supported by SGLang?
Related resources
https://github.com/NVIDIA/TensorRT-LLM/blob/b7868dd1bd1186840e3755b97ea3d3a73ddd76c5/examples/falcon/README.md?plain=1#L311