microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
13.68k stars 2.78k forks source link

[Performance] Mapfile support for certain external data files is not working #21195

Open ivberg opened 3 weeks ago

ivberg commented 3 weeks ago

Describe the issue

We are attempting to get mapfile support working well using external data files. The model loads fine and works, but while debugging we noticed mapfile support is not working well and error'ing out inside ORT code

https://github.com/microsoft/onnxruntime/pull/19089 https://github.com/onnx/onnx/blob/main/docs/ExternalData.md

Callstack where the mapfile fails due to alignment issues: 00 ps_onnxruntime!onnxruntime::WindowsEnv::MapFileIntoMemory+0xa90 [D:\a_work\1\s\onnxruntime\onnxruntime\core\platform\windows\env.cc @ 449] // Failure here 01 ps_onnxruntime!onnxruntime::utils::GetFileContent+0x12c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\tensorprotoutils.cc @ 899] 02 ps_onnxruntime!onnxruntime::utils::GetExtDataFromTensorProto+0x484 [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\tensorprotoutils.cc @ 1015] // The buffer size, length is coming from here 03 ps_onnxruntime!onnxruntime::session_state_utils::ExtDataTensorProtoToTensor+0x8c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state_utils.cc @ 73] 04 ps_onnxruntime!onnxruntime::session_state_utils::DeserializeTensorProto+0x37c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state_utils.cc @ 126] 05 ps_onnxruntime!onnxruntime::session_state_utils::SaveInitializedTensors+0x1208 [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state_utils.cc @ 310] 06 ps_onnxruntime!onnxruntime::SessionState::FinalizeSessionStateImpl+0x76c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state.cc @ 1476] 07 ps_onnxruntime!onnxruntime::SessionState::FinalizeSessionState+0x1b4 [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state.cc @ 1189] 08 ps_onnxruntime!onnxruntime::InferenceSession::Initialize+0x2178 [D:\a_work\1\s\onnxruntime\onnxruntime\core\session\inference_session.cc @ 2015] 09 ps_onnxruntime!`anonymous namespace'::InitializeSession+0x250 [D:\a_work\1\s\onnxruntime\onnxruntime\core\session\onnxruntime_c_api.cc @ 763] 0a ps_onnxruntime!OrtApis::CreateSession+0xa0 [D:\a_work\1\s\onnxruntime\onnxruntime\core\session\onnxruntime_c_api.cc @ 779]

Instead we are hitting an error "mapped offset must be a multiple of the allocation granularity"..." from ORT and swallowing it. I say swallowing it because as per other stack yes we go on the error path reading the whole file into the buffer as backup.

To reproduce

Get a model with external data file. e.g. model.onnx & model.onnx.data. Not all files will reproduce the issue due to alignment with the target OS

const ORTCHAR_T * filemodelpath = ORT_TSTR("model.onnx"); Load with: Ort::Session(env, filemodelpath, session_options);

// The model seems to load fine and works with external data file

Urgency

Fairly urgent

For now trying workaround with AddExternalInitializersFromFilesInMemory

Platform

Windows

OS Version

23H2

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

55f7f9d7a9b88c4e7f0eb7cf4d7f31004761f5cb

ONNX Runtime API

C++

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

pranavsharma commented 3 weeks ago

Can you attach a sample model? and this happens on ARM64 only?

ivberg commented 3 weeks ago

We are seeing about sharing the model directly. It seems the alignment issue could happen on multiple platforms. I happen to be testing / using ARM64 though.