Open RyanRio opened 9 months ago
We have a test for this here. I replaced the implementation of MockedOrtAllocator with your allocator and all tests passed. Can you provide a full working example (that I can compile on my machine) along with the model file?
Hi Pranav, yeah the test passes on my machine too, fair enough that you need the full example + model. I have to reproduce on a shareable example. I'm going to be away for ~1.5 weeks, sorry for the bad timing, if you want to close and then I'll reopen once I'm back that's fine with me, or not! Thanks 😃
I think I know what the issue is. I debugged a similar issue today with an internal team. The problem is that our math lib assumes a certain alignment the size of which comes from the MlasGetPreferredBufferAlignment()
function. The default value is 64. See this. If you simply changed your malloc like this and use alignment = 64, it'll be just fine. Please try this and let me know. This is just a workaround for now. We'll work on a fix. Stay tuned.
I think I know what the issue is. I debugged a similar issue today with an internal team. The problem is that our math lib assumes a certain alignment the size of which comes from the
`MlasGetPreferredBufferAlignment()
function. The default value is 64. See this. If you simply changed your malloc like this and use alignment = 64, it'll be just fine. We'll work on a fix. Stay tuned.
Will do, I've temporarily shifted gears to the linked issue methodology but I'll want to try both code paths in any case for performance analysis, thanks for the temporary workaround.
Hi @pranavsharma this does fix it, but quick question - when using a custom allocator like this does m_ort->EnableCpuMemArena(m_session_options)
still have any effect? I.e. does the arena just use the custom free and malloc I provide? I still would like to ideally use the ONNX arena, and provide a custom OrtArenaCfg for optimal memory usage, but just have it delegate allocations.
Hi @pranavsharma this does fix it, but quick question - when using a custom allocator like this does
m_ort->EnableCpuMemArena(m_session_options)
still have any effect? I.e. does the arena just use the custom free and malloc I provide? I still would like to ideally use the ONNX arena, and provide a custom OrtArenaCfg for optimal memory usage, but just have it delegate allocations.
If you supply a custom allocator, the Enable... setting has no effect.
When I call EnableCpuMemArena I see this -
2024-04-25 21:53:14.0928601 [I:onnxruntime:test, bfc_arena.cc:29 onnxruntime::BFCArena::BFCArena] Creating BFCArena for Cpu with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 max_power_of_two_extend_bytes: 1073741824 memory limit: 18446744073709551615 arena_extend_strategy: 0
2024-04-25 21:53:14.1031188 [V:onnxruntime:test, bfc_arena.cc:66 onnxruntime::BFCArena::BFCArena] Creating 21 bins of max chunk size 256 to 268435456
2024-04-25 21:53:14.1068891 [I:onnxruntime:, inference_session.cc:1476 onnxruntime::InferenceSession::Initialize] This session will use the allocator registered with the environment.
and when I disable it with DisableCpuMemArena those first 2 lines aren't there. Seems like at the very least it should be disabled so it doesn't wastefully create anything? (And in then in both cases I see later on the same allocations to my custom allocator)
Describe the issue
In onnxruntime\onnxruntime\core\mlas\lib\sgemm.cpp throws access violation reading location 0xFFFF... when enabled shared session usage of a custom allocator kOrtSessionOptionsConfigUseEnvAllocators. This happens even when only a single session has been created.
The error itself seems to be happening in MlasGemmFloatKernalFma3, but I don't have the symbols loaded for that (any help there would be appreciated, I've custom built, and supposedly enabled all debug functionality).
I am following https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/shared_lib/test_inference.cc, and I believe I'm following it exactly. One thing I may be getting wrong is that a different MockedAllocator instance is being used for initializing the tensors in the example, I'm not sure why this is important. I tried this and same results, though.
I have confirmed that my custom onnx build passes the test.
To reproduce
Here is a minimal example -
Custom allocator:
Env creation:
Later... session creation and usage
Urgency
No response
Platform
Windows
OS Version
10
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
rel-1.16.3
ONNX Runtime API
C
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response