I want to set number of available gpu's independently per model.
Reason
Sometimes I want to serve; some models on gpu, some on cpu (and switch gpu/cpu usage depending on circumstances )
Right now the only gpu setting is on torchserve instance level, through config.properties -> num_of_gpu
Since torchserve is acting as one-to-many model server (one torchserve instance - many models) I think this should be supported.
It used to be that scale workers enpoint had a num_gpus option?
I would rather avoid manual gpu/cpu allocation on a custom handle.py. Would very much like if torchserve smartly did gpu allocation based on num_of_gpu.
Use case
Lets say one of my services requires 3 different models to be running;
I use one server bare-metal/or one docker container to launch them.
2 of them are slow on cpu so I wan't them on gpu, one of them is fine on cpu.
OR
I want 3 versions of same model some on gpu some on cpu...
etc etc.
Describe alternatives solution
depending on your deployment strategy (workflow?)
Docker based scenarios may favor one-to-one deployments of torchserve<->model. then coordinate them with a docker-compose to get desired effect. but then you get +2 more torchserve instances than needed & only works for docker deployments.
Requested feature
I want to set number of available gpu's independently per model.
Reason
Sometimes I want to serve; some models on gpu, some on cpu (and switch gpu/cpu usage depending on circumstances ) Right now the only gpu setting is on torchserve instance level, through
config.properties
->num_of_gpu
Since torchserve is acting as one-to-many model server (one torchserve instance - many models) I think this should be supported. It used to be that scale workers enpoint had a num_gpus option? I would rather avoid manual gpu/cpu allocation on a custom handle.py. Would very much like if torchserve smartly did gpu allocation based on num_of_gpu.Use case
Lets say one of my services requires 3 different models to be running; I use one server bare-metal/or one docker container to launch them. 2 of them are slow on cpu so I wan't them on gpu, one of them is fine on cpu. OR I want 3 versions of same model some on gpu some on cpu... etc etc.
Describe alternatives solution
depending on your deployment strategy (workflow?) Docker based scenarios may favor one-to-one deployments of torchserve<->model. then coordinate them with a docker-compose to get desired effect. but then you get +2 more torchserve instances than needed & only works for docker deployments.