Has the accuracy and performance been compared with awq？

dingjingzhen commented 5 months ago

Has the accuracy and performance been compared with awq int4？

catid commented 4 months ago

Should read their github blog post - It performs much better

Summer-Summer commented 4 months ago

At the time when we were writing the research paper, TensorRT-LLM's W4A16 kernel was faster than AWQ's W4A16 kernel, so we compared our kernel performance with TensorRT-LLM's W4A16 kernel. According to this figure, our FP6 kernel achieves similar performance with fine-grained W4A16 kernel and is slightly slower than coarse-grained W4A16. As for accuracy, coarse-grained W4A16 shows quite bad results. We also found that FP6 quantization is more robust than INT4. Please also refer to this paper for more insights related to model accuracy.

usyd-fsalab / fp6_llm

Has the accuracy and performance been compared with awq？ #4