Open crinex opened 14 hours ago
Dear @kimishpatel @jerryzh168 @shewu-quic
I want to split a model(eg, Llama-3.2-3B) into multiple layers and apply different quantization settings(qnn_8a8w, qnn_16a4w...) to each layer. Has such a method been tested in ExecuTorch? If not, could you suggest how this can be achieved?
Thank you
Is this specific to qnn backend or your question is in general?
Dear @kimishpatel @jerryzh168 @shewu-quic
I want to split a model(eg, Llama-3.2-3B) into multiple layers and apply different quantization settings(qnn_8a8w, qnn_16a4w...) to each layer. Has such a method been tested in ExecuTorch? If not, could you suggest how this can be achieved?
Thank you