Open quic-suppugun opened 6 months ago
I have compiled two images based on this PR for easy use. They are:
The first image only replaces the ONNX backend while keeping everything else unchanged. The second image provides a smaller CPU version.
Hey! I was going to work on resolving the same issue with session sharing and noticed that this PR already exists, so thanks. Is there a reluctance to do this that I'm missing?
For every instance in a model instance group a new ORT session is created. This code adds support to share a session per instance group. This support can be enabled by defining 'share_session_between_instances' to true in triton model config "parameters". Example: parameters [ ..... { key: "share_session_between_instances" value: {string_value: "true"} } ]
This is a global parameter and cannot be defined per instance group. The user should determine if the parameter makes sense for their setup.
When log-info option of tritonserver is set to "1", the logs will indicate that a session is mapped for the instance group during the first initialized instance and reused for other instances. Example: TRITONBACKEND_ModelInstanceInitialize:_0_1 (CPU device 0)
TRITONBACKEND_ModelInstanceInitialize: _0_0 (CPU device 0)
Could not find a session corresponding to instance group: _0
Created session for instance: _0_1
Mapped session for instance group: _0
Reusing session for instance: _0_0
Change-Id: I6dc509b9c2451e3dd14d45f6f150b37f50b5db89