microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.83k stars 2.94k forks source link

[Mobile] IOS library crashes in Release configuration #21960

Open Zaratusa opened 2 months ago

Zaratusa commented 2 months ago

Describe the issue

Hi,

I'm currently adding the Android & IOS library to the Unreal Engine 5.4 NNERuntimeORT plugin. Android works fine in Development & Release, however IOS only works in Debug, not in Release builds. I've used the following code to include the framework to the plugin:

if (Target.Platform == UnrealTargetPlatform.IOS)
{
    PublicAdditionalFrameworks.Add(new Framework(
        "ONNXRuntime", 
        Path.Combine(PluginDirectory, "Binaries", "ThirdParty", "Onnxruntime", Target.Platform.ToString(), "onnxruntime.xcframework"), 
        null, 
        true
    ));
}

The framework is always correctly included in the Frameworks folder in the app, but as soon as the OrtApi is used, the app crashes. I also tried to use ORT_API_MANUAL_INIT and manually init the API, which didn't change anything. I've also tried binaries from nuget, as well as download.onnxruntime.ai/pod-archive-onnxruntime-c. Also tried versions 1.14.0, 1.14.1 as well as 1.19.1 all versions with exactly the same behaviour.

To reproduce

Urgency

I'm near the deadline for the next release, which is why I wanted to create a shipping build for the final testing.

Platform

iOS

OS Version

17.6.1

ONNX Runtime Installation

Released Package

Package Name (if 'Released Package')

onnxruntime-c

ONNX Runtime Version or Commit ID

1.14.1

ONNX Runtime API

C++/C

Architecture

ARM64

Execution Provider

Default CPU

skottmckay commented 2 months ago

I don't think any of us are familiar with UnrealEngine or that plugin, and the link returns a 404 for me. I'm not aware of anyone else having an issue running on iOS though so the plugin could potentially be the issue. However if it's checked into the UnrealEngine source I would assume it works for others.

Can you provide a stack trace from the crash?

The code using OrtApi that crashes would be helpful too. Are you creating the OrtEnv instance before making any other calls and keeping it valid until the end?

Zaratusa commented 2 months ago

In order to see the Unreal Engine Github Repository, you have to join the Epic Games Organisation.

The code causing the crash is rather simple OrtEnvironment = Ort::Env(). OrtEnvironment is a class variable defined as Ort::Env OrtEnvironment{nullptr}. Nothing else is happening before, as I didn't define ORT_API_MANUAL_INIT. But in my test case where I did define it, I called Ort::InitApi() before.

Here is the stack trace using 1.14.1. Line 17 has the described code above: crash crash.txt

Zaratusa commented 2 months ago

Relevant files in the Unreal Engine are:

skottmckay commented 2 months ago

Based on the stack it's crashing when registering the internal ORT operator schemas.

https://github.com/microsoft/onnxruntime/blob/20d94648bbb106c74c43ef4023e142dee0342155/onnxruntime/core/session/environment.cc#L243

That code is run for every usage of ORT on all platforms. If there was something wrong with it it should break everywhere.

Your issue may be due to creating the Ort::Env with a nullptr. That hits a ctor which doesn't actually create a valid environment.

struct Env : detail::Base<OrtEnv> {
  explicit Env(std::nullptr_t) {}  ///< Create an empty Env object, must be assigned a valid one to be used

  /// \brief Wraps OrtApi::CreateEnv
  Env(OrtLoggingLevel logging_level = ORT_LOGGING_LEVEL_WARNING, _In_ const char* logid = "");

You could try creating it with no arguments as that would hit the second constructor.

Zaratusa commented 2 months ago

I've changed the env variable definition to Ort::Env OrtEnvironment, but it the crash is still the same.

In the mean time I've also removed the embedding of the framework, as it's a static library for IOS, by changing the Unreal plugin code to:

if (Target.Platform == UnrealTargetPlatform.IOS)
{
    PublicAdditionalFrameworks.Add(new Framework(
        "ONNXRuntime", 
        Path.Combine(PluginDirectory, "Binaries", "ThirdParty", "Onnxruntime", Target.Platform.ToString(), "onnxruntime.xcframework")
    ));
}

This also didn't solve the issue. However I didn't manage to add the compiler flags only for the Unreal plugin, which are set in the Cocoapod -fvisibility=hidden -fvisibility-inlines-hidden. Could these also cause this issue?

skottmckay commented 2 months ago

However I didn't manage to add the compiler flags only for the Unreal plugin, which are set in the Cocoapod -fvisibility=hidden -fvisibility-inlines-hidden. Could these also cause this issue?

I wouldn't have thought so. They're about restricting symbol visibility, but your issue is not an unresolved symbol.

Based on the crash info it appears something is off with how memory is being allocated/freed. AFAIK ORT isn't doing anything special here and as it's happening at startup we haven't done much at all at that point. Given we haven't seen this issue previously and that code is always run on all platforms I would suspect something about what NNERuntimeORT + Unreal Engine is doing is the cause and you should be asking the developers of that for assistance. e.g. maybe something limits the memory or changes how memory allocation/free works by overriding malloc/free and the optimization only happens in a release build.

skottmckay commented 2 months ago

Something is off in the callstack as well. You said you were using Ort::Env OrtEnvironment{nullptr}. but that maps to a dummy ctor in the ORT C++ API.

The callstack you provided has an Ort::Env with something additional in the namespace (Ort011401) that we don't add, and is a ctor taking the log level and log id as params.

image
Zaratusa commented 2 months ago

Something is off in the callstack as well. You said you were using Ort::Env OrtEnvironment{nullptr}. but that maps to a dummy ctor in the ORT C++ API.

That is only the definition of the OrtEnvironment variable in the header of the class file. The crash occurs when the plugin calls OrtEnvironment = Ort::Env()

The callstack you provided has an Ort::Env with something additional in the namespace (Ort011401) that we don't add, and is a ctor taking the log level and log id as params.

Ort011401 is an inline namespace added by Unreal Plugin developers for easier versioning of the SDK, but even when the inline namespace is disabled, it still crashes when the library calls RegisterOpSetSchema<contrib::OpSet_Microsoft_ver1>()

skottmckay commented 2 months ago

Given the plugin you're using doesn't officially support usage on mobile and the ORT environment initialization works fine everywhere else (including iOS example applications we have) I would expect the issue is with the plugin.

https://forums.unrealengine.com/t/course-neural-network-engine-nne/1162628/158

Zaratusa commented 2 months ago

To avoid any relation between the issue and the NNERuntimeORT plugin, I've created a minimal example project which has exactly the same behaviour. The example can be run with Unreal Engine 5.4 installed from the Epic Games Launcher or from a source build.

Zaratusa commented 1 month ago

Anyone else got any idea here? I really need to fix this issue 😓