quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.08k stars 373 forks source link

quantsim.export(path, filename_prefix) could not generate int8 QNN ONNX model #2773

Open JiliangNi opened 6 months ago

JiliangNi commented 6 months ago

After calling quantsim.export(path, filename_prefix), I could not get int8 QNN ONNX model. My objective is to get an int8 ONNX model through aimet quant toolkit, which shows like the attached image below. int8_ONNX_model

However, by calling quantsim.export(path, filename_prefix), I only can get pth files, encoding files and one fp32 ONNX model. Did I use the export functionality incorrectly? Or is any way to convert encoding files and the fp32 ONNX model to one int8 QNN model?

quic-mangal commented 5 months ago

You used it correctly, you can take the encodings and FP32 model to a quantized target to get a quantized model. AIMET only simulates HW performance

quic-akinlawo commented 5 months ago

@JiliangNi please use --keep_quant_nodes option with the qnn converters to see a QNN model with activation quant/dequant nodes. Without this option, quant nodes are stripped from the graph.