mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT License
2.52k stars 200 forks source link

Support for Qwen models #175

Open Huyueeer opened 6 months ago

Huyueeer commented 6 months ago

Any plans to support quantitative reasoning for qwen models?

YihuaJerry commented 6 months ago

We have apply AWQ on Qwen models, as same as other LLMs

robert-lee2016 commented 6 months ago

Qwen

same as other LLMs, we fail to apply awq in deepseek-v2 caused on unsupported module like mla(multi-latent attention)

BeeLiu24 commented 2 months ago

We have apply AWQ on Qwen models, as same as other LLMs

Hi, how about the ppl of wikitext 2 ?