triton-inference-server / pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
https://triton-inference-server.github.io/pytriton/
Apache License 2.0
687 stars 45 forks source link

Model instances question #69

Closed tinsss closed 1 month ago

tinsss commented 2 months ago

Hi!

Thanks for the amazing work on PyTriton! I want to ask a few questions regarding how model instancing works under the hood.

  1. From the source code it seems like each inference function is wrapped to InferenceHandler which is a subclass of th.Thread. How do we get different processes running the inference from this?
  2. In the multi instance inference example each instance is a new python object. Why do wee need different objects in this case?

Thanks in advance!

leafjungle commented 2 months ago

I am also puzzled by this. "Multi-instance" means the server processes multiple requests in the same time( something like multi-thread, multi-process), these requests share the same “infer_func”, it is nothing to do with "infer_func" to be a list (for cpu only functions) ?

And when I set infer_func to be a list with length=K, will the server start K processes to handle the requests?

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 7 days with no activity.