triton-inference-server / pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
https://triton-inference-server.github.io/pytriton/
Apache License 2.0
687 stars 45 forks source link

while inference by running server.py and client.py why client is taking gpu memory. #47

Closed Justsubh01 closed 7 months ago

Justsubh01 commented 7 months ago

Hello, i am new to the triton and try to understand it's behaviour. i am facing one confusion which is given below :-

here I am running two client requests in one server.py. Why two client.py is consuming gpu memory by showing two gpu processes when model is running in server.

screen

here 967MiB has consumed by server.py script and 105 mb consumed by client.py files.

While multiple client request comes, is triton creating multiple instances of the same model or running on a single instance itself ?.