triton-inference-server / pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
https://triton-inference-server.github.io/pytriton/
Apache License 2.0
687 stars 45 forks source link

What is the proxy backend in pytriton? #53

Closed HJH0924 closed 5 months ago

HJH0924 commented 6 months ago

In Triton, the backend refers to the several types recorded in this link, while in the architecture of pytriton, proxy backend is mentioned. What is this?

piotrm-nvidia commented 6 months ago

The Triton Inference Server's architecture and the concept of a "proxy backend" in PyTriton can be understood by examining their respective documentation and usage.

While the Triton Inference Server's backend refers to the actual implementations executing the models, the "Proxy Backend" in the context of PyTriton appears to be a mechanism within PyTriton for handling data transfer between Python callbacks and the Triton Server.

  1. Triton Inference Server Backends: According to the Triton Backend readme, a backend in Triton is the implementation that executes a model. This can be a wrapper around a deep learning framework like PyTorch, TensorFlow, TensorRT, or ONNX, or it can be custom C/C++ logic performing operations like image pre-processing. Each backend must implement the Triton Backend API, allowing Triton to send requests to the backend for execution and the backend to communicate with Triton. PyTriton uses Python backend, which runs server for PyTriton proxy written in Python. It is generic and can be used for any framework, because it just handles input/output tensors and passes them to Python callbacks to Inference Callable.

  2. PyTriton Proxy Backend: The reference to "Proxy Backend" in the PyTriton context, as mentioned in the PyTriton high level design, relates to the mechanism for passing input/output tensors between processes using shared memory. PyTriton initializes a certain amount of shared memory for this Proxy Backend to facilitate this data transfer. Inference Callable exists in your process, but Triton spawns another process for Python backend and there our generic code with Proxy Backend waits for inference requests. You define Python function and it must be called by Triton, when data is ready.

Is this explanation sufficient? If not, please let me know and I will try to clarify further.

HJH0924 commented 6 months ago

You explained it very clearly and I understood, thank you very much!

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 5 months ago

This issue was closed because it has been stalled for 7 days with no activity.