quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.08k stars 373 forks source link

issue about the exported onnx model #2581

Open czy2014hust opened 9 months ago

czy2014hust commented 9 months ago

I use the AIMET PTQ to quantize the CLIP text model.

But I encounter this error [KeyError: 'Graph has no buffer /text_model/encoder/layers.0/layer_norm1/Constant_output_0, referred to as input for text_model.encoder.layers.0.layer_norm1#2'] when using the qnn-onnx-converter to convert the onnx model.

I check the node in the exported onnx model with Netron, I find the second input of the Pow is disappear. It is just constant node with value 2.0.

image

czy2014hust commented 9 months ago

By the way, the AIMET commit: 35e588226f02 with torch-gpu.

zzh-www commented 9 months ago

+1. I got this error with aimet1.29.0 too.

quic-hitameht commented 9 months ago

Hi @czy2014hust Can you please tell us which QNN version are you using to convert ONNX models?

czy2014hust commented 9 months ago

QNN version:2.16.4.231110

shiyuetianqiang commented 9 months ago

+1

Aaron4Fun commented 9 months ago

+1. Any suggestions will be appreciated.

zzh-www commented 9 months ago

I found that the problem sovled by converting it with aimet 1.25.0.

quic-hitameht commented 9 months ago

Thanks for sharing your QNN version. I was under the impression that this issue is resolved with QNN version you are using. But let me check and get back to you on this.

Meanwhile, you can use Pytorch 1.9 variant of AIMET 1.29 release. https://github.com/quic/aimet/releases/tag/1.29.0

Background: The root cause for this issue is that Pytorch 1.13 version applies bunch of optimizations to ONNX graph and provides more optimized graphs when the mode is set to EVAL. This includes constant folding, Conv+BN fusion etc.

In this case, the optimization has found identical values for LayerNorm parameters and replaces them with a single value, so it ends up with only one initializer in the ONNX graph and it messes up the QNN conversion step.

czy2014hust commented 9 months ago

Thank you for your reply. I met the same issue with QNN latest version 2.17.0.231124.

czy2014hust commented 9 months ago

Thanks for sharing your QNN version. I was under the impression that this issue is resolved with QNN version you are using. But let me check and get back to you on this.

Meanwhile, you can use Pytorch 1.9 variant of AIMET 1.29 release. https://github.com/quic/aimet/releases/tag/1.29.0

Background: The root cause for this issue is that Pytorch 1.13 version applies bunch of optimizations to ONNX graph and provides more optimized graphs when the mode is set to EVAL. This includes constant folding, Conv+BN fusion etc.

In this case, the optimization has found identical values for LayerNorm parameters and replaces them with a single value, so it ends up with only one initializer in the ONNX graph and it messes up the QNN conversion step.

Hi,Does this issue exist for version AIMET-1.29?

theoctopusride commented 7 months ago

you need to change the onnx simplifier to onnx_utils.simplify_onnx_model = False

MisterTab commented 5 months ago

May I ask how this problem was ultimately resolved

MisterTab commented 5 months ago

@czy2014hust

1826133674 commented 3 months ago

I meet this error in aimet_torch 1.31.0. Any suggestions to solve this problem?

Name: aimet-torch Version: torch-gpu-1.31.0 Summary: AIMET torch Package Home-page: https://github.com/quic/aimet/releases/download/torch-gpu_1.31.0 Author: Qualcomm Innovation Center, Inc. Author-email: aimet.os@quicinc.com License: NOTICE.txt

1826133674 commented 3 months ago

you need to change the onnx simplifier to onnx_utils.simplify_onnx_model = False

Thank you for your reply!I'v solve it according your tips! By the way, the code is in file xx/aimet_quantsim.py def export_quantsim_model(qsim, output_path, dummy_input, filename_prefix, verbose=False, opset_version=11, use_external_data_format=False, input_names=None, output_names=None):

onnx_utils.update_all_onnx_nodes_name = False
onnx_utils.simplify_onnx_model = False