microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.25k stars 2.87k forks source link

[Mobile] Broadcasting Error in Sub Node with ONNX Runtime Version 1.17.3: Incompatibility with Dimension Broadcasting Rules #20672

Open josefeliuf opened 4 months ago

josefeliuf commented 4 months ago

Describe the issue

Encountered a broadcasting error in Sub node within ONNX Runtime version 1.17.3 under Android. The error message indicates a failure in dimension broadcasting, specifically: "Attempting to broadcast an axis by a dimension other than 1. 2 by 3". The issue led to a fatal exception and was resolved by downgrading to version 1.16.3 of ONNX Runtime.

Expected Behavior:

The model should process images without errors, handling dimension broadcasting as per the model's specifications. The same model worked without errors in a previous version of ONNX Runtime.

Additional Context:

Model uses internal pre and post processing and is based on YOLOv8. Error occurs sporadically, suggesting it might be related to specific input dimensions or internal state changes.

Here is the code:

val rawImageBytes = inputStream.readBytes()

val shape = longArrayOf(rawImageBytes.size.toLong())

val inputTensor = OnnxTensor.createTensor(
    ortEnvironment,
    ByteBuffer.wrap(rawImageBytes),
    shape,
    OnnxJavaType.UINT8
)
inputTensor.use {

    val output = ortSession.run(
        Collections.singletonMap("image", inputTensor),
        setOf("image_out", "scaled_box_out_next")
    )

Here is the stacktrace:

2024-05-13 12:32:25.380 24950-25106 AndroidRuntime com.sosmartlabs.momotablet E FATAL EXCEPTION: DefaultDispatcher-worker-2 Process: com.sosmartlabs.momotablet, PID: 24950 ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: Non-zero status code returned while running Sub node. Name:'post_process_58' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:540 void onnxruntime::BroadcastIterator::Init(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 2 by 3

                                                                                                    at ai.onnxruntime.OrtSession.run(Native Method)
                                                                                                    at ai.onnxruntime.OrtSession.run(OrtSession.java:395)
                                                                                                    at ai.onnxruntime.OrtSession.run(OrtSession.java:242)
                                                                                                    at com.sosmartlabs.dug.vision.detectors.YoloV8ObjectDetector.detect(YoloV8ObjectDetector.kt:141)
                                                                                                    at com.sosmartlabs.momotablet.services.MomoAIService$analyzeNSFW$3.invokeSuspend(MomoAIService.kt:219)
                                                                                                    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
                                                                                                    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
                                                                                                    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
                                                                                                    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
                                                                                                    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
                                                                                                    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)
                                                                                                    Suppressed: kotlinx.coroutines.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@66a203e, Dispatchers.Default]

To reproduce

Steps to reproduce the issue:

  1. Load the model in ONNX Runtime version 1.17.3 (I could send you the onnx model we are using)
  2. Process an image through the model.
  3. Observe the broadcasting error at the Sub node when certain images are processed.

Urgency

Not urgent, but the issue is critical as it affects the stability of our application on newer versions of ONNX Runtime.

Platform

Android

OS Version

Android 10

ONNX Runtime Installation

Released Package

Compiler Version (if 'Built from Source')

No response

Package Name (if 'Released Package')

onnxruntime-android

ONNX Runtime Version or Commit ID

1.17.3

ONNX Runtime API

Java/Kotlin

Architecture

ARM64

Execution Provider

Other / Unknown

Execution Provider Library Version

No response

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.