microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.87k stars 2.94k forks source link

Exception during initialization: safeint.h:17 static void SafeIntExceptionHandler<onnxruntime::OnnxRuntimeException>::SafeIntOnOverflow() Integer overflow - caused by int64 index of -1? #22694

Open john-dance opened 1 month ago

john-dance commented 1 month ago

Describe the issue

When using the QNN EP, there is an integer overflow on model load.

The model loads and runs on the CPU, or with TfLite delegate. Perhaps helpful is that with TfLite, the QNN delegate fails to prepare so it falls back to running on the GPU+CPU.

Perhaps the following TfLite messages help narrow down the problem with the QNN EP: [tflite] graph_prepare.cc:210:ERROR:could not create op: q::GatherNd.constIdx.tcm [tflite] "node_id_512_op_type_GatherNd_op_count_0" generated: could not create op

To reproduce

Use ORT + QNN EP to run the model found in this AI Hub job: https://app.aihub.qualcomm.com/jobs/jp4lvkx15 (Note: Only Microsoft QNN engineers will have access.)

Urgency

No response

Platform

Android

OS Version

14

ONNX Runtime Installation

Built from Source

Compiler Version (if 'Built from Source')

No response

Package Name (if 'Released Package')

None

ONNX Runtime Version or Commit ID

1.19.2

ONNX Runtime API

C++/C

Architecture

ARM64

Execution Provider

Other / Unknown

Execution Provider Library Version

QNN

ashumish-QCOM commented 2 weeks ago

Hi @john-dance

From the error messages and your description, it seems the overflow occurs when the model is loaded, specifically within the SafeIntExceptionHandler::SafeIntOnOverflow() function. This function is designed to handle cases where integer values exceed their allowable range, which can lead to crashes if not managed properly.

john-dance commented 1 week ago

Agreed, but we need to figure out what is causing that overflow.

I did a little more digging. There is a Gather with an int64 index of -1. The QNN EP should be able to dispatch this Gather to QNN. It's probably hitting this overflow when trying to do the required int64 -> int32 conversion.

(I'll modify the title of the issue.)