Open nikhil-sk opened 1 year ago
@nskool If the user is not concerned about the performance, then I would assume that users can run models via python API in python backend.
@tanmayv25 While it's true that certain users may not be concerned about performance and could do with python backend, the advantage to doing this is for customers to easily switch to an out-of-proc mode, and not have to write any python code. This reduces friction for the user. Additionally, without perf tests, we cannot be sure if out-of-proc framework backends perform the same as python backend, or worse, or better, IMO.
We have added an experimental feature of platform handlers that is similar to solution 3.2: https://github.com/triton-inference-server/python_backend/tree/main/src/resources/platform_handlers/tensorflow_savedmodel
Additionally, with more research and experimentation we were able to find that using jemalloc instead of generic malloc resolves most of the memory issues seen within in-process TF backend. The documentation on how to use jemalloc: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_management.md#model-control-mode-explicit
Is your feature request related to a problem? Please describe. (This is a high-level thought and a feature request, I will update this thread if I can gather more specific data)
Describe the solution you'd like