pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.05k stars 824 forks source link

Add generic support for different GPU hardware. #740

Open harshbafna opened 3 years ago

harshbafna commented 3 years ago

Is your feature request related to a problem? Please describe.

Currently, TorchServe's sanity suite, regression suite, and the recent changes related to logging GPU info in the model description are dependent on nvidia-smi command which is NVIDIA specific.

Also, the default handlers currently support Cuda devices or CPU only.

Describe the solution

Add generic support for different GPU hardware.

chauhang commented 3 years ago

@harshbafna What is the additional work remaining for this issue?

harshbafna commented 3 years ago

@chauhang: We currently only support CUDA GPU hardwares and it is hard-coded in certain sections of code and test suites. This issue has been logged to enhance this to a more generic solution and will be taken up in future releases.

marcusnchow commented 2 years ago

Does pytorch serve support amd rocm gpus now that pytorch officially supports them?

msaroufim commented 2 years ago

@marcusnchow not yet but if you could share a bit more about your interest in ROCM we should restart the discussion with the team

marcusnchow commented 2 years ago

There are various AMD multi-gpu systems, like AMD instinct (similiar to Nvidia's DGX systems) that are used for inference workloads and would benefited by support from pytorch serve. We have an 8 amd gpu system that we use for bench marking inference servers. We already have tensorflow and tensorflow serving running and would like to compare with pytorch for inference performance. What modifications would be needed to support AMD gpu?

msaroufim commented 2 years ago

Out of curiosity what's the performance difference you've observed on a few models when comparing DGX vs Instinct?

It's probably possible to support Instinct just trying to understand the immediate value, I would suspect we need to generalize all the places in the codebase that mention CUDA to support ROCM as well which given that Pytorch just supports ROCM now shouldn't be a huge pain

marcusnchow commented 2 years ago

oh we dont have access to a DGX, i guess i was using it as an example. but we would like to use pytorch to benchmark AMD GPUs for inference performance :)

ismu commented 1 week ago

Hi, Any update on this topic? Is there a chance for AMD GPU support at this moment when we have ROCm PyTorch?