microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.1k stars 2.84k forks source link

[Mobile] Inference error using QNN on Android phone. #21848

Open zhangw864680355 opened 3 weeks ago

zhangw864680355 commented 3 weeks ago

Describe the issue

when using yolov8 fp32 onnx model by qnn, it runs successfully in Snapdragon 8 Gen 2 (SM8550 pnone: redme k70),but it run failedly in Snapdragon 8888 (SM8350 phone: realme gt),it's error are as follows:

catch exception:Node 'Conv' OpType:Conv with domain:com.ms.internal.nhwc was inserted using the NHWC format as requested by QNNExecutionProvider, but was not selected by that EP. This means the graph is now invalid as there will not be an EP able to run the node. This could be a bug in layout transformer, or in the GetCapability implementation of the EP.

i alse run other algorithm, the error is the same.

qnn parameters as follows,some parameter(soc_model、htp_arch) are invalid after being setted. struct QnnConfig { std::string backend_path = "libQnnHtp.so"; // std::string enable_htp_fp16_precision = "1"; //0 std::string profiling_level = "off"; //off std::string htp_performance_mode = "default"; std::string high_power_saver = "default"; std::string qnn_context_priority = "normal"; std::string htp_graph_finalization_optimization_mode = "0"; std::string soc_model = "0"; //0 30 43 std::string htp_arch = "0"; //0 68 73 std::string device_id = "0"; };

onnxruntie:18.0 qnn:2.22.6.240515 onnxruntie:19.0 qnn:2.25.0.240728

To reproduce

sorry,There are many engineering codes, which are not easy to provide.

Urgency

No response

Platform

Android

OS Version

13/14

ONNX Runtime Installation

Built from Source

Compiler Version (if 'Built from Source')

tag:19.0

Package Name (if 'Released Package')

None

ONNX Runtime Version or Commit ID

19.0

ONNX Runtime API

C++/C

Architecture

ARM64

Execution Provider

Other / Unknown

Execution Provider Library Version

QNN

jywu-msft commented 2 weeks ago

it it runs on SM8550 but fails on SM8350 , it's unlikely to be an onnxruntime EP issue. the error typically happens when QNN op validation fails (and because the op has been optimized to channel last), there are no other EP's that can consume it.

skottmckay commented 2 weeks ago

Is there any logic in the QNN EP that is hardware dependent?

Otherwise it should be deterministic as the EP said it could take the Conv node, we converted to NHWC (which is a generic operation), and when asked again the EP is now saying it can't take the node. If there's no hardware specific logic involved in the two calls to GetCapability it should either succeed or fail consistently on all hardware.

jywu-msft commented 2 weeks ago

Is there any logic in the QNN EP that is hardware dependent?

Otherwise it should be deterministic as the EP said it could take the Conv node, we converted to NHWC (which is a generic operation), and when asked again the EP is now saying it can't take the node. If there's no hardware specific logic involved in the two calls to GetCapability it should either succeed or fail consistently on all hardware.

QNN EP is just calling QNN api's for op validation so there's no specific logic in the EP. we should look into why the op validation failed to better understand this case. @adrianlizarraga, is there a better way to surface this error is due to op validation and report the actual cause?

adrianlizarraga commented 2 weeks ago

QNN EP is just calling QNN api's for op validation so there's no specific logic in the EP. we should look into why the op validation failed to better understand this case. @adrianlizarraga, is there a better way to surface this error is due to op validation and report the actual cause?

Yes, @zhangw864680355 can you please enable verbose logging and save the logs to a file? The logs will probably contain the string QnnDsp <E> QnnBackend_validateOpConfig failed. The lines around that search string will contain the actual error.

If using the C++ API headers, can use this Env constructor to set the verbose severity level: https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_cxx_api.h#L701

jywu-msft commented 2 weeks ago

QNN EP is just calling QNN api's for op validation so there's no specific logic in the EP. we should look into why the op validation failed to better understand this case. @adrianlizarraga, is there a better way to surface this error is due to op validation and report the actual cause?

Yes, @zhangw864680355 can you please enable verbose logging and save the logs to a file? The logs will probably contain the string QnnDsp <E> QnnBackend_validateOpConfig failed. The lines around that search string will contain the actual error.

would be nice if we can save the validator errors and somehow bubble this up rather than require user to rerun/enable verbose logging. maybe it's not straightforward.

jeffreywolberg commented 2 weeks ago

I get the same exact issue when creating an InferenceSession with a resnet50 with onnxruntime==1.18.1 and qnn-sdk=2.19.0 on Qualcomm's qcs6490 device

jeffreywolberg commented 2 weeks ago

I've also been able to replicate this issue using onnxruntime==1.19.0 and qnn-sdk==2.25.0.240728

jywu-msft commented 2 weeks ago

I've also been able to replicate this issue using onnxruntime==1.19.0 and qnn-sdk==2.25.0.240728

can you enable verbose logging as @adrianlizarraga mentions above so we can have a better understanding of the actual failure?

jeffreywolberg commented 2 weeks ago

I have been able to minimally reproduce the problem. Attached below are the stdout|stderr outputs of a script that

1) creates a vanilla resnet50 and exports it to onnx opset version 21, 2) tries to quantize it to Uint8/16 activations and Uint8/16 weights using onnxruntime.quantization python libraries, 3) runs a basic inference session with it.

It was tested on a qcs6490 arm64 Qualcomm device with onnxruntime-qnn==1.19.0 and qnn-sdk==2.25.0.240728. Attached is the .whl file for this version of onnxruntime compiled with this version of qnn: onnxruntime_qnn-1.19.0-cp38-cp38-linux_aarch64.whl.zip. You can install it by doing `pip3 install And here are the requirements.txt for the other dependencies when I ran the script.

You can run it like python3 test_qnn_quant --w_quant_type QUInt16 --a_quant_type QUInt16

import argparse
import onnx
import torch
from torchvision.models.resnet import resnet50, ResNet50_Weights
import torch.nn as nn
import numpy as np

from onnxruntime import InferenceSession, SessionOptions
from onnxruntime.quantization import QuantType, quantize, CalibrationDataReader
from onnxruntime.quantization.shape_inference import quant_pre_process
from onnxruntime.quantization.execution_providers.qnn import get_qnn_qdq_config, qnn_preprocess_model

class RandomDataReader(CalibrationDataReader):
    def __init__(self, model_path, N=10):
        session = InferenceSession(model_path, providers=['CPUExecutionProvider'])
        inputs = session.get_inputs()
        self.data_list = [{f'{inputs[0].name}': np.random.random((inputs[0].shape)).astype(np.float32)} for i in range(N)]
        self.data_list_iter = iter(self.data_list)

    def get_next(self) -> dict:
        return next(self.data_list_iter, None)

def get_resnet_model_onnx_opset21():
    shape = (1, 3, 320, 320)
    input_data = torch.randn(shape)
    model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
    model.avgpool = nn.Identity()
    model.fc = nn.Identity()
    model.eval()

    model = torch.jit.trace(model, input_data)
    exported_path = f'resnet50_jit_trace.onnx'
    torch.onnx.export(model, input_data, exported_path)

    onnx_model = onnx.load(exported_path)
    onnx_model = onnx.version_converter.convert_version(onnx_model, 21) # need opset 21 to have 16-bit quantization
    onnx.checker.check_model(onnx_model)
    export_path_opset21 = exported_path[:-5] + "_opset21.onnx"
    onnx.save(onnx_model, export_path_opset21)
    print(f"Exported opset21 model to {export_path_opset21}")
    return export_path_opset21

def quantize_onnx_model(model_path, w_quant_type : QuantType, a_quant_type : QuantType):
    my_data_reader = RandomDataReader(onnx_model_path)
    quant_preproc_model_path = model_path[:-5] + "_quant_preproc.onnx"
    quant_pre_process(model_path, quant_preproc_model_path, verbose=1)
    qnn_preproc_model_path = model_path[:-5] + "_qnn_preproc.onnx"
    model_changed = qnn_preprocess_model(quant_preproc_model_path, qnn_preproc_model_path)
    model_to_quantize = qnn_preproc_model_path if model_changed else quant_preproc_model_path

    qnn_config = get_qnn_qdq_config(model_to_quantize, my_data_reader, activation_type=a_quant_type, weight_type=w_quant_type)
    output_model_path = model_path[:-5] + f"_w{w_quant_type.name}_a{a_quant_type.name}.onnx"
    quantize(model_to_quantize, output_model_path, qnn_config)
    print(f"Successfully exported quantized model to {output_model_path}")
    return output_model_path

def create_onnx_inference_sesssion(quantized_model_path):
    options = SessionOptions()
    # options.add_session_config_entry("session.disable_cpu_ep_fallback", "1")
    options.log_severity_level = 0
    sess = InferenceSession(quantized_model_path, sess_options=options, providers=["QNNExecutionProvider"], provider_options=[{"backend_path": "/opt/qnn/lib/libQnnHtp.so"}])
    print(f"Created inference session!")
    return sess

def benchmark_inference(sess : InferenceSession, N=5):
    from time import time
    inputs = sess.get_inputs()
    for i in range(N):
        input_data = {}
        for inp in inputs:
            print(f"[{i}]: Input: {inp.name}, {inp.shape}")
            name, shape = str(inp.name), list(inp.shape)
            input_data[name] = np.random.randn(*shape).astype(np.float32)
        st = time()
        result = sess.run(None, input_data)
        e = time()
        print(f"[{i}]: time: {round(e - st, 4)}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    qtype_str_to_enum = {"QInt8": QuantType.QInt8, "QInt16": QuantType.QInt16, "QUInt8": QuantType.QUInt8, "QUInt16": QuantType.QUInt16}
    parser.add_argument('-wqt', '--w_quant_type', choices=list(qtype_str_to_enum.keys()))
    parser.add_argument('-aqt', '--a_quant_type', choices=list(qtype_str_to_enum.keys()))
    w_quant_type = parser.parse_args().w_quant_type
    a_quant_type = parser.parse_args().a_quant_type
    w_quant_type = qtype_str_to_enum[w_quant_type]
    a_quant_type = qtype_str_to_enum[a_quant_type]
    onnx_model_path = get_resnet_model_onnx_opset21()
    quantized_model_path = quantize_onnx_model(onnx_model_path, w_quant_type, a_quant_type)
    sess = create_onnx_inference_sesssion(quantized_model_path)
    benchmark_inference(sess)

Whenever I quantize a model with this script it always outputs the desired file, but running the model as an ort.InferenceSession with the Qualcomm DSP backend always fails with the error mentioned above. However, the ort.InferenceSession with the DSP backend runs successfully if I quantize the model with --w_quant_type == QUInt8. If I use 16-bit quantization for the weights, then the issue occurs. But using 8-bit quantization is not feasible for me since it excessively degrades the precision of my model outputs.

Here is an output log for a failed run where I was quantizing the weights using QUInt16: wQUInt16_aQUInt16_test_qnn_quant_out.log

And here is an output log for a successful run where I was quantizing the weights using QUInt8: wQUInt8_aQUInt16_test_qnn_quant_out.log

jeffreywolberg commented 2 weeks ago

When I inspect my onnx model, I notice that none of the nodes are of type com.ms.internal.nhwc, it seems like these nodes are being added during the onnxruntime compilation of the model for the QNN. I don't fully understand why this error occurs only when weights are quantized to UInt16 and not UInt8. Any help would be very appreciated.

jywu-msft commented 2 weeks ago

there are multiple issues filed here with the same error message and similar causes. both are due to constraints/limitations of the underlying HW you're testing on. @zhangw864680355 , I believe your issue is sm8350 doesn't support fp16. fp16 requires hexagon v73 and above (that is why it works on sm8550) @jeffreywolberg if you reference https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/HtpOpDefSupplement.html Search: Conv2d and look at the Constraints Int16 section: 16bit Activation and 16bit Weight require minimum arch V73 I think your qcs6490 arm device is V68

jeffreywolberg commented 2 weeks ago

Thanks, I missed this point in the reference doc. This explains it.

jywu-msft commented 2 weeks ago

Thanks, I missed this point in the reference doc. This explains it.

no problem. we'll reach out to qualcomm and give them feedback that perhaps they could make the op validation failure messages more user friendly.