Closed tinsss closed 1 month ago
I am also puzzled by this. "Multi-instance" means the server processes multiple requests in the same time( something like multi-thread, multi-process), these requests share the same “infer_func”, it is nothing to do with "infer_func" to be a list (for cpu only functions) ?
And when I set infer_func to be a list with length=K, will the server start K processes to handle the requests?
This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
Hi!
Thanks for the amazing work on PyTriton! I want to ask a few questions regarding how model instancing works under the hood.
InferenceHandler
which is a subclass ofth.Thread
. How do we get different processes running the inference from this?Thanks in advance!