quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.06k stars 373 forks source link

Example using AIMET qunatized model and onnruntime #2880

Open escorciav opened 4 months ago

escorciav commented 4 months ago

I'm having issues to verify that a simulated quantized onnx file offers decent performance

Issue: After doing PTQ. I cannot use the quantized model in onnx-runtime! (preferably GPU)

escorciav commented 4 months ago

Others have faced similar issues, no?

Potential thing to test. Why? TOL: it's a library or binary. QNN does something similar to emulate/simulate runtime afaiu.

e-said commented 4 months ago

Hi @escorciav I'm using aimet_torch, and there you have a method to convert aimet custom nodes to torch native QDQ nodes. When I use native QDQ torch nodes and export the onnx model, I'm able to run onnx-runtime on CPU successfully

escorciav commented 4 months ago

Thanks for chiming in @e-said !

Do you mind to share a simple Python script with a silly onnx model showcasing that? Sorry in advance if it's too demanding. Happy to leave a ⭐ in a Github repo or Gist &/Or endorse it via Twitter :)

e-said commented 4 months ago

Hi @escorciav I don't have a simple script showing this (my pipeline is quite complexe) but I can share some hints to help you create a script to test this:

PS: please note that your model should contain only int8 QDQ nodes otherwise it won't be converted to onnx

escorciav commented 3 months ago

No worries. I have to do QAT. Thus, I gotta use aimet_torch as per Qualcomm:AIMET dev (maintainers) suggestion