Open Veccoy opened 2 months ago
I think this is coming from the use of specific PyTorch QAT modules that wrap the weights FakeQuantize modules as attributes, while activation FakeQuantize modules appear as standalone nodes in the Torch fx graph.
This structure is made when preparing the Torch fx graph and is kept until the export of the model, which generate ONNX files with ATen operations.
Checklist
Describe the question you meet
Hi, I'm trying to export a quantized model after an QAT experiment. I have implemented the following hook to export it into ONNX format using the
get_deploy_model
method ofMMArchitectureQuant
.In this built-in method, the
post_process_for_deploy
from theNativeQuantizer
is applied and seems to process specifically the weightFakeQuantize
modules. Moreover, the end of theget_deploy_model
method ofMMArchitectureQuant
seems to give a specific postprocess to activationFakeQuantize
module by copying it and changing the nature of the Torch class used (?).Then, when I visualize my ONNX on Netron, the traduction made for activation and weight
FakeQuantize
modules are different. This is problematic for execution on specific hardware as the traduction used for weightFakeQuantize
modules is not recognized.How can I get the same ONNX QuantizeLinear+DequantizeLinear layers for both activation and weight
FakeQuantize
modules?Post related information
Here is my quantization configuration used with
OpenVINOQuantizer
: