Open zhangw864680355 opened 3 weeks ago
it it runs on SM8550 but fails on SM8350 , it's unlikely to be an onnxruntime EP issue. the error typically happens when QNN op validation fails (and because the op has been optimized to channel last), there are no other EP's that can consume it.
Is there any logic in the QNN EP that is hardware dependent?
Otherwise it should be deterministic as the EP said it could take the Conv node, we converted to NHWC (which is a generic operation), and when asked again the EP is now saying it can't take the node. If there's no hardware specific logic involved in the two calls to GetCapability
it should either succeed or fail consistently on all hardware.
Is there any logic in the QNN EP that is hardware dependent?
Otherwise it should be deterministic as the EP said it could take the Conv node, we converted to NHWC (which is a generic operation), and when asked again the EP is now saying it can't take the node. If there's no hardware specific logic involved in the two calls to
GetCapability
it should either succeed or fail consistently on all hardware.
QNN EP is just calling QNN api's for op validation so there's no specific logic in the EP. we should look into why the op validation failed to better understand this case. @adrianlizarraga, is there a better way to surface this error is due to op validation and report the actual cause?
QNN EP is just calling QNN api's for op validation so there's no specific logic in the EP. we should look into why the op validation failed to better understand this case. @adrianlizarraga, is there a better way to surface this error is due to op validation and report the actual cause?
Yes, @zhangw864680355 can you please enable verbose logging and save the logs to a file? The logs will probably contain the string QnnDsp <E> QnnBackend_validateOpConfig failed
. The lines around that search string will contain the actual error.
If using the C++ API headers, can use this Env constructor to set the verbose severity level: https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_cxx_api.h#L701
QNN EP is just calling QNN api's for op validation so there's no specific logic in the EP. we should look into why the op validation failed to better understand this case. @adrianlizarraga, is there a better way to surface this error is due to op validation and report the actual cause?
Yes, @zhangw864680355 can you please enable verbose logging and save the logs to a file? The logs will probably contain the string
QnnDsp <E> QnnBackend_validateOpConfig failed
. The lines around that search string will contain the actual error.
would be nice if we can save the validator errors and somehow bubble this up rather than require user to rerun/enable verbose logging. maybe it's not straightforward.
I get the same exact issue when creating an InferenceSession with a resnet50
with onnxruntime==1.18.1 and qnn-sdk=2.19.0 on Qualcomm's qcs6490 device
I've also been able to replicate this issue using onnxruntime==1.19.0
and qnn-sdk==2.25.0.240728
I've also been able to replicate this issue using
onnxruntime==1.19.0
andqnn-sdk==2.25.0.240728
can you enable verbose logging as @adrianlizarraga mentions above so we can have a better understanding of the actual failure?
I have been able to minimally reproduce the problem. Attached below are the stdout|stderr outputs of a script that
1) creates a vanilla resnet50 and exports it to onnx opset version 21,
2) tries to quantize it to Uint8/16 activations and Uint8/16 weights using onnxruntime.quantization
python libraries,
3) runs a basic inference session with it.
It was tested on a qcs6490 arm64 Qualcomm device with onnxruntime-qnn==1.19.0
and qnn-sdk==2.25.0.240728
. Attached is the .whl
file for this version of onnxruntime
compiled with this version of qnn
:
onnxruntime_qnn-1.19.0-cp38-cp38-linux_aarch64.whl.zip. You can install it by doing `pip3 install
You can run it like
python3 test_qnn_quant --w_quant_type QUInt16 --a_quant_type QUInt16
import argparse
import onnx
import torch
from torchvision.models.resnet import resnet50, ResNet50_Weights
import torch.nn as nn
import numpy as np
from onnxruntime import InferenceSession, SessionOptions
from onnxruntime.quantization import QuantType, quantize, CalibrationDataReader
from onnxruntime.quantization.shape_inference import quant_pre_process
from onnxruntime.quantization.execution_providers.qnn import get_qnn_qdq_config, qnn_preprocess_model
class RandomDataReader(CalibrationDataReader):
def __init__(self, model_path, N=10):
session = InferenceSession(model_path, providers=['CPUExecutionProvider'])
inputs = session.get_inputs()
self.data_list = [{f'{inputs[0].name}': np.random.random((inputs[0].shape)).astype(np.float32)} for i in range(N)]
self.data_list_iter = iter(self.data_list)
def get_next(self) -> dict:
return next(self.data_list_iter, None)
def get_resnet_model_onnx_opset21():
shape = (1, 3, 320, 320)
input_data = torch.randn(shape)
model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
model.avgpool = nn.Identity()
model.fc = nn.Identity()
model.eval()
model = torch.jit.trace(model, input_data)
exported_path = f'resnet50_jit_trace.onnx'
torch.onnx.export(model, input_data, exported_path)
onnx_model = onnx.load(exported_path)
onnx_model = onnx.version_converter.convert_version(onnx_model, 21) # need opset 21 to have 16-bit quantization
onnx.checker.check_model(onnx_model)
export_path_opset21 = exported_path[:-5] + "_opset21.onnx"
onnx.save(onnx_model, export_path_opset21)
print(f"Exported opset21 model to {export_path_opset21}")
return export_path_opset21
def quantize_onnx_model(model_path, w_quant_type : QuantType, a_quant_type : QuantType):
my_data_reader = RandomDataReader(onnx_model_path)
quant_preproc_model_path = model_path[:-5] + "_quant_preproc.onnx"
quant_pre_process(model_path, quant_preproc_model_path, verbose=1)
qnn_preproc_model_path = model_path[:-5] + "_qnn_preproc.onnx"
model_changed = qnn_preprocess_model(quant_preproc_model_path, qnn_preproc_model_path)
model_to_quantize = qnn_preproc_model_path if model_changed else quant_preproc_model_path
qnn_config = get_qnn_qdq_config(model_to_quantize, my_data_reader, activation_type=a_quant_type, weight_type=w_quant_type)
output_model_path = model_path[:-5] + f"_w{w_quant_type.name}_a{a_quant_type.name}.onnx"
quantize(model_to_quantize, output_model_path, qnn_config)
print(f"Successfully exported quantized model to {output_model_path}")
return output_model_path
def create_onnx_inference_sesssion(quantized_model_path):
options = SessionOptions()
# options.add_session_config_entry("session.disable_cpu_ep_fallback", "1")
options.log_severity_level = 0
sess = InferenceSession(quantized_model_path, sess_options=options, providers=["QNNExecutionProvider"], provider_options=[{"backend_path": "/opt/qnn/lib/libQnnHtp.so"}])
print(f"Created inference session!")
return sess
def benchmark_inference(sess : InferenceSession, N=5):
from time import time
inputs = sess.get_inputs()
for i in range(N):
input_data = {}
for inp in inputs:
print(f"[{i}]: Input: {inp.name}, {inp.shape}")
name, shape = str(inp.name), list(inp.shape)
input_data[name] = np.random.randn(*shape).astype(np.float32)
st = time()
result = sess.run(None, input_data)
e = time()
print(f"[{i}]: time: {round(e - st, 4)}")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
qtype_str_to_enum = {"QInt8": QuantType.QInt8, "QInt16": QuantType.QInt16, "QUInt8": QuantType.QUInt8, "QUInt16": QuantType.QUInt16}
parser.add_argument('-wqt', '--w_quant_type', choices=list(qtype_str_to_enum.keys()))
parser.add_argument('-aqt', '--a_quant_type', choices=list(qtype_str_to_enum.keys()))
w_quant_type = parser.parse_args().w_quant_type
a_quant_type = parser.parse_args().a_quant_type
w_quant_type = qtype_str_to_enum[w_quant_type]
a_quant_type = qtype_str_to_enum[a_quant_type]
onnx_model_path = get_resnet_model_onnx_opset21()
quantized_model_path = quantize_onnx_model(onnx_model_path, w_quant_type, a_quant_type)
sess = create_onnx_inference_sesssion(quantized_model_path)
benchmark_inference(sess)
Whenever I quantize a model with this script it always outputs the desired file, but running the model as an ort.InferenceSession
with the Qualcomm DSP backend always fails with the error mentioned above. However, the ort.InferenceSession
with the DSP backend runs successfully if I quantize the model with --w_quant_type
== QUInt8. If I use 16-bit quantization for the weights, then the issue occurs. But using 8-bit quantization is not feasible for me since it excessively degrades the precision of my model outputs.
Here is an output log for a failed run where I was quantizing the weights using QUInt16: wQUInt16_aQUInt16_test_qnn_quant_out.log
And here is an output log for a successful run where I was quantizing the weights using QUInt8: wQUInt8_aQUInt16_test_qnn_quant_out.log
When I inspect my onnx model, I notice that none of the nodes are of type com.ms.internal.nhwc
, it seems like these nodes are being added during the onnxruntime compilation of the model for the QNN. I don't fully understand why this error occurs only when weights are quantized to UInt16 and not UInt8. Any help would be very appreciated.
there are multiple issues filed here with the same error message and similar causes. both are due to constraints/limitations of the underlying HW you're testing on. @zhangw864680355 , I believe your issue is sm8350 doesn't support fp16. fp16 requires hexagon v73 and above (that is why it works on sm8550) @jeffreywolberg if you reference https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/HtpOpDefSupplement.html Search: Conv2d and look at the Constraints Int16 section: 16bit Activation and 16bit Weight require minimum arch V73 I think your qcs6490 arm device is V68
Thanks, I missed this point in the reference doc. This explains it.
Thanks, I missed this point in the reference doc. This explains it.
no problem. we'll reach out to qualcomm and give them feedback that perhaps they could make the op validation failure messages more user friendly.
Describe the issue
when using yolov8 fp32 onnx model by qnn, it runs successfully in Snapdragon 8 Gen 2 (SM8550 pnone: redme k70),but it run failedly in Snapdragon 8888 (SM8350 phone: realme gt),it's error are as follows:
catch exception:Node 'Conv' OpType:Conv with domain:com.ms.internal.nhwc was inserted using the NHWC format as requested by QNNExecutionProvider, but was not selected by that EP. This means the graph is now invalid as there will not be an EP able to run the node. This could be a bug in layout transformer, or in the GetCapability implementation of the EP.
i alse run other algorithm, the error is the same.
qnn parameters as follows,some parameter(soc_model、htp_arch) are invalid after being setted. struct QnnConfig { std::string backend_path = "libQnnHtp.so"; // std::string enable_htp_fp16_precision = "1"; //0 std::string profiling_level = "off"; //off std::string htp_performance_mode = "default"; std::string high_power_saver = "default"; std::string qnn_context_priority = "normal"; std::string htp_graph_finalization_optimization_mode = "0"; std::string soc_model = "0"; //0 30 43 std::string htp_arch = "0"; //0 68 73 std::string device_id = "0"; };
onnxruntie:18.0 qnn:2.22.6.240515 onnxruntie:19.0 qnn:2.25.0.240728
To reproduce
sorry,There are many engineering codes, which are not easy to provide.
Urgency
No response
Platform
Android
OS Version
13/14
ONNX Runtime Installation
Built from Source
Compiler Version (if 'Built from Source')
tag:19.0
Package Name (if 'Released Package')
None
ONNX Runtime Version or Commit ID
19.0
ONNX Runtime API
C++/C
Architecture
ARM64
Execution Provider
Other / Unknown
Execution Provider Library Version
QNN