triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.27k stars 1.47k forks source link

python backend error: c_python_backend_utils.TritonModelException: Tensor is stored in GPU and cannot be converted to NumPy #3944

Closed zhaohb closed 2 years ago

zhaohb commented 2 years ago

Description I am currently using the Python Backend BLS function and called another tensorrt model using the pb_utils.inferencerequest interface and the call succeeded, but the result is stored on the GPU,and I can't find how to copy the interface from the GPU.

Triton Information 22.01

Are you using the Triton container or did you build it yourself? no

Expected behavior Can python backend copy InferenceRequest results directly to the CPU?

Here is my debugging information:

(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(60)face_detect()
-> inputs=[images],
(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(61)face_detect()
-> requested_output_names=self.outputs_0)
(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(58)face_detect()
-> infer_request = pb_utils.InferenceRequest(
(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(62)face_detect()
-> infer_response = infer_request.exec()
(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(65)face_detect()
-> confs = pb_utils.get_output_tensor_by_name(infer_response, 'class')
(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(66)face_detect()
-> locs = pb_utils.get_output_tensor_by_name(infer_response, 'bbox')
(Pdb) p confs
<c_python_backend_utils.Tensor object at 0x7f08f1716130>
(Pdb) dir(confs)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'as_numpy', 'from_dlpack', 'is_cpu', 'name', 'to_dlpack', 'triton_dtype']
(Pdb) p confs.is_cpu()
False
(Pdb) p confs.as_numpy()
*** c_python_backend_utils.TritonModelException: Tensor is stored in GPU and cannot be converted to NumPy.
(Pdb)

This is the code that I sent the request:

        ......
        import pdb
        pdb.set_trace()
        images = pb_utils.Tensor("images", preprocessed_imgs)
        infer_request = pb_utils.InferenceRequest(
            model_name=self.model_name0,
            inputs=[images],
            requested_output_names=self.outputs_0)
        infer_response = infer_request.exec()
        #if infer_response.has_error():
        #    return False
        confs = pb_utils.get_output_tensor_by_name(infer_response, 'class')
        locs = pb_utils.get_output_tensor_by_name(infer_response, 'bbox')
        ......
Tabrizian commented 2 years ago

The tensors stored in the GPU cannot be used with NumPy. You can use PyTorch with DLPack to convert them to CPU.

zhaohb commented 2 years ago

Thank you for your reply. I have solved this problem using DLPack. Thank you very much. I will close that issue.

zhaohb commented 2 years ago

@Tabrizian Again, why is it on the GPU, and under what circumstances is it on the CPU?

Tabrizian commented 2 years ago

There are many circumstances that lead to the BLS output being on GPU. For example, the backend can decide which output to use for GPU tensors. Python backend tries to provide the outputs in the same device it has received the output to avoid data movement.

zhaohb commented 2 years ago

I see. thank you very much.

hegc commented 2 years ago

Hi, how you solved the problem using DLPack?

Tabrizian commented 2 years ago

There is documentation about this in here:

https://github.com/triton-inference-server/python_backend#interoperability-and-gpu-support

Jackiexiao commented 2 years ago

I guess from numpy 1.23 we could use

output_tensors = infer_response.output_tensors()
x = np.from_dlpack(output_tensors[0].to_dlpack())

numpy docs: https://numpy.org/doc/1.23/reference/generated/numpy.from_dlpack.html?highlight=from_dlpack

triton 22.05's numpy version is 1.22.4

Jackiexiao commented 2 years ago

oh, I update numpy to 1.23, but it doesn't work, error :

0625 06:20:51.932539 145 pb_stub.cc:605] Failed to process the request(s) for model 'stream_0_0', 
message: AttributeError: type object 'PyCapsule' has no attribute '__dlpack__'

is there any way to solve this problem? do you have any idea about whether it's numpy's bug or triton's bug? @Tabrizian

Tabrizian commented 2 years ago

Looks like the DLPack protocol has changed a bit since we designed this interface in Python backend and Numpy is using a newer version. I'll file a ticket for improving the DLPack support in Python backend.

harish-headroom commented 2 years ago

Any update on this? I am currently blocked by this

gianpd commented 1 year ago

Hi, I've the same problem. Very strange, the tensor located in the GPU has the method "to_dlpack", even if it has not the attribute dlpack . I'm working with the 22.10 triton server version.

xiong-qiao commented 1 year ago

Hi any update now? Blocked by this issue too.

numpy version: 1.23.3 tritonserver version: 22.09

Tabrizian commented 1 year ago

Sorry for the delay. We are working on this ticket and hopefully it should be available soon.

Jackiexiao commented 10 months ago

we can use

infer_request = pb.utils.InferenceRequest(
                             model_name = ...,
                             requested_output_names = [...],
                             inputs = [...],
                            preferred_memory = pb_utils. PreferredMemory(pb_utils.TRITONSERVER_MEMORY_CPU,0)
)
response = infer_request.exec()
tensor_cpu = pb.utils.get_output_tensor_by_name(response,...)
tensor_cpu.as_numpy()