triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.27k stars 1.47k forks source link

[Question / Bug?] DLPack tensor is not contiguous, even though I use tensor.contiguous in torch #5494

Closed MatthieuToulemont closed 1 year ago

MatthieuToulemont commented 1 year ago

Description I am using a python model as a BLS in which I am sending requests to TRT models and do some processing in torch in between. Whenever I am sending a torch tensor to a TRT model I make use the following function:

def pb_utils_tensor_from_dlpack_torch(tensor_name: str, tensor: torch.Tensor) -> Any:
    return pb_utils.Tensor.from_dlpack(tensor_name, torch.utils.dlpack.to_dlpack(tensor.contiguous()))

However I still get occasional errors claiming the tensor is either not contiguous or not C-ordered. However I don't know what C-ordered means and could not find a clear a definition of what being C-ordered means. What else do you I need to do to make sure that I don't get those errors ?

 [StatusCode.INTERNAL] Failed to process the request(s) for model instance 'MODEL_NAME', message: TritonModelException: DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order are supported.

Triton Information Triton Version: 22.09 Pytorch Version: 1.12.0

Are you using the Triton container or did you build it yourself? I am using the 22.09 triton container in which I installed torch==1.12.0

Thank you for the great work on Triton,

dyastremsky commented 1 year ago

Thanks for your question! C order is row major order. It denotes how the tensor is stored. For example, if using Numpy, you can specify this here.

Using .contiguous() should make the tensor contiguous, so it being contiguous should not be an issue.

CC: @Tabrizian, who may know where else something could be going wrong. It's harder to tell without seeing how the tensor was generated so any context there is helpful, though I understand this is a more complicated case with the tensor being manipulated in the middle of a BLS pipeline.

Tabrizian commented 1 year ago

As David mentioned, C-order is row major order. We do have a test for non-contiguous tensors and calling .contiguous should make it c-order contiguous. Could you please share a minimal repro? According to the bug description it sounds like you only occasionally see this issue?

dyastremsky commented 1 year ago

Closing issue due to lack of activity. Please reopen the issue if you would like to follow up with this issue.

MatthieuToulemont commented 1 year ago

Sorry I was on holidays for the past ten days, at the moment it's hard to have minimal repro as it happens randomly

dyastremsky commented 1 year ago

Hope you enjoyed your vacation! Reopening. We'll need a minimal repro to be able to investigate. It's okay if it doesn't happen 100% of the time, but do please provide a repro and let us know how often it happens with the repro if you'd like us to investigate.

MatthieuToulemont commented 1 year ago

Hello, It seems we have found the issue on our side. A miscellaneous cropping operation that creates a dimension of 0 in the tensor, thus making the C-Order / contiguous check fail.

Thank you all for the time spent on this, it turns out it was on our side :D