Is this an expected BFCArena limitation, or is it something misconfigured on my side?
I'm expecting that having a Session object per worker thread should eliminate contention. However, I've seen developers here discourage people from setups like this. Why? What are the drawbacks? I'm assuming increased memory consumption (this is fine for me), anything else?
And if that is indeed an expected limitation, then, I'd say this needs some improvement. For example, a caller could pass their own BFCArena instance to Session.Run(), or BFCArena could track each thread_id and keep an array of arenas per each thread.
To reproduce
Initialize a single Session with the following settings:
CPU execution provider
intra_threads set to 1
inter_threads set to 1
execution_mode set to SEQUENTIAL
arena allocator enabled
memory pattern optimization enabled
Then, call Session.Run from many threads concurrently.
Describe the issue
Hi,
I've noticed that a significant chunk of time is spent on locks inside
onnxruntime
. Specifically, insideBFCArena::AllocateRawInternal
https://github.com/microsoft/onnxruntime/blob/01673389b8c51dbea918900c2954966908c7fcaf/onnxruntime/core/framework/bfc_arena.cc#L328The conditions are as follows:
Session
object in the whole applicationSession.Run
at the same timeintra_threads
andinter_threads
set 1,execution_mode
set toSEQUENTIAL
, arena allocator enabled, memory pattern optimization enabledSee flamegraph screenshots below:
strace
shows that 92% of the application time is spent infutex
calls:Is this an expected
BFCArena
limitation, or is it something misconfigured on my side?I'm expecting that having a
Session
object per worker thread should eliminate contention. However, I've seen developers here discourage people from setups like this. Why? What are the drawbacks? I'm assuming increased memory consumption (this is fine for me), anything else?And if that is indeed an expected limitation, then, I'd say this needs some improvement. For example, a caller could pass their own
BFCArena
instance toSession.Run()
, orBFCArena
could track eachthread_id
and keep an array of arenas per each thread.To reproduce
Initialize a single
Session
with the following settings:intra_threads
set to 1inter_threads
set to 1execution_mode
set toSEQUENTIAL
Then, call
Session.Run
from many threads concurrently.Urgency
No response
Platform
Linux
OS Version
NixOS, Gentoo
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.0
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No