Example using AIMET qunatized model and onnruntime

quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

https://quic.github.io/aimet-pages/index.html

Other

2.06k stars 373 forks source link

Example using AIMET qunatized model and onnruntime #2880

Open escorciav opened 4 months ago

escorciav commented 4 months ago

I'm having issues to verify that a simulated quantized onnx file offers decent performance

Issue: After doing PTQ. I cannot use the quantized model in onnx-runtime! (preferably GPU)

escorciav commented 4 months ago

Others have faced similar issues, no?

Potential thing to test. Why? TOL: it's a library or binary. QNN does something similar to emulate/simulate runtime afaiu.

e-said commented 4 months ago

Hi @escorciav I'm using aimet_torch, and there you have a method to convert aimet custom nodes to torch native QDQ nodes. When I use native QDQ torch nodes and export the onnx model, I'm able to run onnx-runtime on CPU successfully

escorciav commented 4 months ago

Thanks for chiming in @e-said !

Do you mind to share a simple Python script with a silly onnx model showcasing that? Sorry in advance if it's too demanding. Happy to leave a ⭐ in a Github repo or Gist &/Or endorse it via Twitter :)

e-said commented 4 months ago

Hi @escorciav I don't have a simple script showing this (my pipeline is quite complexe) but I can share some hints to help you create a script to test this:

In aimet quantsim.py you have the method to export onnx. If you set _use_embeddedencodings to True, the onnx will be generated based on a converted torch model (custom aimet nodes are replaced by native torch nodes)
Once you get this model with embedded QDQ nodes, it should run on onnx-runtime without any issue

PS: please note that your model should contain only int8 QDQ nodes otherwise it won't be converted to onnx

escorciav commented 3 months ago

No worries. I have to do QAT. Thus, I gotta use aimet_torch as per Qualcomm:AIMET dev (maintainers) suggestion