triton-inference-server / onnxruntime_backend

The Triton backend for the ONNX Runtime.
BSD 3-Clause "New" or "Revised" License
125 stars 54 forks source link

Add support for sharing an ORT session #248

Open quic-suppugun opened 6 months ago

quic-suppugun commented 6 months ago

For every instance in a model instance group a new ORT session is created. This code adds support to share a session per instance group. This support can be enabled by defining 'share_session_between_instances' to true in triton model config "parameters". Example: parameters [ ..... { key: "share_session_between_instances" value: {string_value: "true"} } ]

This is a global parameter and cannot be defined per instance group. The user should determine if the parameter makes sense for their setup.

When log-info option of tritonserver is set to "1", the logs will indicate that a session is mapped for the instance group during the first initialized instance and reused for other instances. Example: TRITONBACKEND_ModelInstanceInitialize: _0_1 (CPU device 0) TRITONBACKEND_ModelInstanceInitialize: _0_0 (CPU device 0) Could not find a session corresponding to instance group: _0 Created session for instance: _0_1 Mapped session for instance group: _0 Reusing session for instance: _0_0

Change-Id: I6dc509b9c2451e3dd14d45f6f150b37f50b5db89

Jackiexiao commented 6 months ago

I have compiled two images based on this PR for easy use. They are:

The first image only replaces the ONNX backend while keeping everything else unchanged. The second image provides a smaller CPU version.

adityagoel4512 commented 1 month ago

Hey! I was going to work on resolving the same issue with session sharing and noticed that this PR already exists, so thanks. Is there a reluctance to do this that I'm missing?