[Question] What's the groupsize of w4a16 + w8a16

quic / ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

https://aihub.qualcomm.com

BSD 3-Clause "New" or "Revised" License

496 stars 78 forks source link

[Question] What's the groupsize of w4a16 + w8a16 #112

Open xiguadong opened 2 weeks ago

xiguadong commented 2 weeks ago

Hello , in the https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4/blob/c34a4a91629f09f73a285f32dbd26106b033c654/config.json#L29 has mentioned the groupsize is 128 for 4bit or 8bit. So could you tell me the groupsize for this model?

And If I want to deploy the official 4bit model to QNN, how shuold I do?

thanks

shreyajn commented 3 days ago

The Qwen on AI Hub Models is Qwen 2.0. The block group size is 64.

If using our provided model, you can deploy it using the tutorial: https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie