Open escorciav opened 4 months ago
Others have faced similar issues, no?
Potential thing to test. Why? TOL: it's a library or binary. QNN does something similar to emulate/simulate runtime afaiu.
Hi @escorciav I'm using aimet_torch, and there you have a method to convert aimet custom nodes to torch native QDQ nodes. When I use native QDQ torch nodes and export the onnx model, I'm able to run onnx-runtime on CPU successfully
Thanks for chiming in @e-said !
Do you mind to share a simple Python script with a silly onnx model showcasing that? Sorry in advance if it's too demanding. Happy to leave a ⭐ in a Github repo or Gist &/Or endorse it via Twitter :)
Hi @escorciav I don't have a simple script showing this (my pipeline is quite complexe) but I can share some hints to help you create a script to test this:
PS: please note that your model should contain only int8 QDQ nodes otherwise it won't be converted to onnx
No worries. I have to do QAT. Thus, I gotta use aimet_torch as per Qualcomm:AIMET dev (maintainers) suggestion
I'm having issues to verify that a simulated quantized onnx file offers decent performance
Issue: After doing PTQ. I cannot use the quantized model in onnx-runtime! (preferably GPU)