Closed zhaohb closed 2 years ago
The tensors stored in the GPU cannot be used with NumPy. You can use PyTorch with DLPack to convert them to CPU.
Thank you for your reply. I have solved this problem using DLPack. Thank you very much. I will close that issue.
@Tabrizian Again, why is it on the GPU, and under what circumstances is it on the CPU?
There are many circumstances that lead to the BLS output being on GPU. For example, the backend can decide which output to use for GPU tensors. Python backend tries to provide the outputs in the same device it has received the output to avoid data movement.
I see. thank you very much.
Hi, how you solved the problem using DLPack?
There is documentation about this in here:
https://github.com/triton-inference-server/python_backend#interoperability-and-gpu-support
I guess from numpy 1.23 we could use
output_tensors = infer_response.output_tensors()
x = np.from_dlpack(output_tensors[0].to_dlpack())
numpy docs: https://numpy.org/doc/1.23/reference/generated/numpy.from_dlpack.html?highlight=from_dlpack
triton 22.05's numpy version is 1.22.4
oh, I update numpy to 1.23, but it doesn't work, error :
0625 06:20:51.932539 145 pb_stub.cc:605] Failed to process the request(s) for model 'stream_0_0',
message: AttributeError: type object 'PyCapsule' has no attribute '__dlpack__'
is there any way to solve this problem? do you have any idea about whether it's numpy's bug or triton's bug? @Tabrizian
Looks like the DLPack protocol has changed a bit since we designed this interface in Python backend and Numpy is using a newer version. I'll file a ticket for improving the DLPack support in Python backend.
Any update on this? I am currently blocked by this
Hi, I've the same problem. Very strange, the tensor located in the GPU has the method "to_dlpack", even if it has not the attribute dlpack . I'm working with the 22.10 triton server version.
Hi any update now? Blocked by this issue too.
numpy
version: 1.23.3
tritonserver
version: 22.09
Sorry for the delay. We are working on this ticket and hopefully it should be available soon.
we can use
infer_request = pb.utils.InferenceRequest(
model_name = ...,
requested_output_names = [...],
inputs = [...],
preferred_memory = pb_utils. PreferredMemory(pb_utils.TRITONSERVER_MEMORY_CPU,0)
)
response = infer_request.exec()
tensor_cpu = pb.utils.get_output_tensor_by_name(response,...)
tensor_cpu.as_numpy()
Description I am currently using the Python Backend BLS function and called another tensorrt model using the pb_utils.inferencerequest interface and the call succeeded, but the result is stored on the GPU,and I can't find how to copy the interface from the GPU.
Triton Information 22.01
Are you using the Triton container or did you build it yourself? no
Expected behavior Can python backend copy InferenceRequest results directly to the CPU?
Here is my debugging information:
This is the code that I sent the request: