microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.84k stars 2.94k forks source link

[Jvm] Native crash during createSession: std::bad_cast #21147

Open gtf35 opened 5 months ago

gtf35 commented 5 months ago

Describe the issue

uname -a
Linux 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux

Code: https://github.com/snakers4/silero-vad/tree/master/examples/java-example

The code runs well on Windows (runtime ver:1.18.0 - 1.15.0 ), but it will crash on linux

I tested 1.18.0 - 1.17.0, it will be crash:

OrtEnvironment env = OrtEnvironment.getEnvironment();
// Create an ONNX session options object
OrtSession.SessionOptions opts = new OrtSession.SessionOptions();
// Set the InterOp thread count to 1, InterOp threads are used for parallel processing of different computation graph operations
opts.setInterOpNumThreads(1);
// Set the IntraOp thread count to 1, IntraOp threads are used for parallel processing within a single operation
opts.setIntraOpNumThreads(1);
// Add a CPU device, setting to false disables CPU execution optimization
opts.addCPU(true);
// Create an ONNX session using the environment, model path, and options
session = env.createSession(modelPath, opts);
//                    ^^^^^^^^^ <= crash

Crash log:

terminate called after throwing an instance of 'std::bad_cast'
  what():  std::bad_cast

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':run'.
> Process 'command '/home/cat/.jdks/corretto-15.0.2/bin/java'' finished with non-zero exit value 134

I tested ver under 1.17.0, eg 1.16.3\1.16.0\1.15.0, the code hangs at env.createSession(modelPath, opts);

To reproduce

Clone code and run

Urgency

No response

Platform

Linux

OS Version

Debian GNU/Linux 12

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

Carsh on 1.18.0 - 1.17.0; Anr below 1.16.3

ONNX Runtime API

Java

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

gtf35 commented 5 months ago

The bot added a label platform:windows, it seems wrong

gtf35 commented 5 months ago

I tested: ver 1.16.3/1.16.2/1.16.1/1.16.0/1.16.0-rc1/1.15.1/1.15.0: hang

ver 1.14.0/1.13.1/1.12.1/1.12.0/1.11.0:

Caused by: ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: Exception during initialization: /onnxruntime_src/onnxruntime/core/framework/session_state.cc: subgraphs_kernel_create_info_maps.find(local_subgraph_kernel_create_info_map_key) == subgraphs_kernel_create_info_maps.end() was false. 

ver 1.10.0 and lower:

Caused by: ai.onnxruntime.OrtException: Error code - ORT_FAIL - message: Load model from ../models/silero_vad.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model_load_utils.h: ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 
gtf35 commented 5 months ago

Tested on Fedora 14 with com.microsoft.onnxruntime:onnxruntime:1.18.0 , JVM crashed:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000000002, pid=5270, tid=5300
#
# JRE version: OpenJDK Runtime Environment Corretto-11.0.23.9.1 (11.0.23+9) (build 11.0.23+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Corretto-11.0.23.9.1 (11.0.23+9-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  0x0000000000000002
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/user/bag/ui/user-interface-next/core.5270)
#
# An error report file with more information is saved as:
# /home/user/bag/ui/user-interface-next/hs_err_pid5270.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/corretto/corretto-11/issues/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

> Task :run FAILED

Here is the hs_err_pid5270.log

Craigacp commented 5 months ago

I loaded the silero_vad.onnx model from https://github.com/snakers4/silero-vad/blob/master/files/silero_vad.onnx successfully on Linux x64 using JDK 17.0.4 with ORT 1.18.0 & 1.16.0, and on macOS arm64 14.5 with ORT 1.18.0. The path you specified to the model was different, is it a different model to the one in the repo? Also can you check if the model loads correctly in ORT in Python on that machine?

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.