Requested feature

I want to set number of available gpu's independently per model.

Reason

Sometimes I want to serve; some models on gpu, some on cpu (and switch gpu/cpu usage depending on circumstances ) Right now the only gpu setting is on torchserve instance level, through config.properties -> num_of_gpu Since torchserve is acting as one-to-many model server (one torchserve instance - many models) I think this should be supported. It used to be that scale workers enpoint had a num_gpus option? I would rather avoid manual gpu/cpu allocation on a custom handle.py. Would very much like if torchserve smartly did gpu allocation based on num_of_gpu.

Use case

Lets say one of my services requires 3 different models to be running; I use one server bare-metal/or one docker container to launch them. 2 of them are slow on cpu so I wan't them on gpu, one of them is fine on cpu. OR I want 3 versions of same model some on gpu some on cpu... etc etc.

Describe alternatives solution

depending on your deployment strategy (workflow?) Docker based scenarios may favor one-to-one deployments of torchserve<->model. then coordinate them with a docker-compose to get desired effect. but then you get +2 more torchserve instances than needed & only works for docker deployments.

pytorch / serve

Support GPU/CPU selection per model #1056

Requested feature

Reason

Use case

Describe alternatives solution