How to Apply Different Quantization Settings Per Layer in ExecuTorch?

pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch

https://pytorch.org/executorch/

Other

2.17k stars 358 forks source link

How to Apply Different Quantization Settings Per Layer in ExecuTorch? #6846

Open crinex opened 14 hours ago

crinex commented 14 hours ago

Dear @kimishpatel @jerryzh168 @shewu-quic

I want to split a model(eg, Llama-3.2-3B) into multiple layers and apply different quantization settings(qnn_8a8w, qnn_16a4w...) to each layer. Has such a method been tested in ExecuTorch? If not, could you suggest how this can be achieved?

Thank you

kimishpatel commented 13 hours ago

Is this specific to qnn backend or your question is in general?