Hello, i am new to the triton and try to understand it's behaviour. i am facing one confusion which is given below :-
here I am running two client requests in one server.py. Why two client.py is consuming gpu memory by showing two gpu processes when model is running in server.
here 967MiB has consumed by server.py script and 105 mb consumed by client.py files.
While multiple client request comes, is triton creating multiple instances of the same model or running on a single instance itself ?.
Hello, i am new to the triton and try to understand it's behaviour. i am facing one confusion which is given below :-
here I am running two client requests in one server.py. Why two client.py is consuming gpu memory by showing two gpu processes when model is running in server.
here 967MiB has consumed by server.py script and 105 mb consumed by client.py files.
While multiple client request comes, is triton creating multiple instances of the same model or running on a single instance itself ?.