triton-inference-server / pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
https://triton-inference-server.github.io/pytriton/
Apache License 2.0
687 stars 45 forks source link

[Question] About the subprocess for multi-instance #74

Closed leafjungle closed 1 month ago

leafjungle commented 2 months ago

I print process id and thread id in the infer_func.

python myserver.py (PID x1) --tritonserver(PID x2)
--triton_python_backend_stub(PID x3) --triton_python_backend_stub(PID x4)

But I find infer_func's PID = x1, but why? the messsage is tranfered from PID=x2 to PID=x3, then to PID=x1? what does subprocess PID=x3, PID=x4 do?

leafjungle commented 2 months ago

PID=x1 will be troubled by GIL. It seems that PID=x3 and PID=x4 just send request to PID=x1

nv-blazejkubiak commented 2 months ago

Your findings are correct. PyTriton utilizes the Python backend for Triton, which, in turn, uses additional stub processes. There is a separate stub for each model instance to enable the initialization of an independent Python environment for each instance.

When we use PyTriton in a Python script, we connect to virtual model instances from the script process. Therefore, if your inference function is CPU-bound, you may indeed encounter issues with the Global Interpreter Lock (GIL). However, in practice, the inference function typically delegates execution to an inference or deep learning framework (such as TensorRT or PyTorch), so this usually does not pose a significant problem.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 7 days with no activity.