triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.27k stars 1.47k forks source link

Copy GPU Tensor to CPU on Python backend without using pytorch #7120

Closed ShuaiShao93 closed 6 months ago

ShuaiShao93 commented 6 months ago

Is your feature request related to a problem? Please describe. I know we can copy GPU tensor to CPU with torch.utils.dlpack.from_dlpack (link), but we don't want to introduce torch into our deps.

Describe the solution you'd like Add an API to Tensor to copy to CPU and convert to numpy.

MatthieuToulemont commented 6 months ago

That would be great, especially given the torch cache

Tabrizian commented 6 months ago

Thanks for your feature request. I've linked your issue to this ticket and will close this issue as duplicate. https://github.com/triton-inference-server/server/issues/3547

Let's keep all the comments related to this feature in that GitHub issue.