triton-inference-server / dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
MIT License
118 stars 28 forks source link

DALI pipeline in Triton - formatting InferInput batch of images for UINT8 #224

Closed mvpel closed 5 months ago

mvpel commented 5 months ago

Hi folks,

Following the example from the DALI Inception ensemble, I put together a DALI model to convert a 0-255 RGB matrix to a 0.0-1.0 matrix of floating point numbers using a Python DALI backend.

Originally, using the old v1 API and the image_preprocess.cc code, in conjunction with the 20.11 image_client.py, I had set the image input as "TYPE_STRING" with a size of 1.

However when sending to the DALI model, the image decoder wasn't able to identify the incoming JPG, so I converted the input to a TYPE_UINT8 as shown in the DALI examples, updating my client code accordingly, and that works for a single image. The bytes of the JPG are converted to a uint8 numPY array containing however many bytes are in the JPG image, and the InferInput is accepted and processed appropriately.

However, when I put together a batch of images, it hits a snag in that the code tries to use an "np.stack()" operation to build a matrix of the images to give to the httpclient.InferInput() call. This worked when the data was a [1]-size bytestring, every value had the same [1] dimensions regardless of the number of bytes in the image. However as a uint8, the dimensions of the array varies from image to image, so they can't be stacked, as that requires all arrays in the stack to be the same dimensions.

What I haven't been able to sort out is what kind of collection of variable-length uint8 ndarrays would be acceptable and recognized as a batch of requests. Would a simple Python list of uint8 ndarrays work, or would it need to be a two-dimensional matrix in the same style as the stack() call produces? I haven't been able to pin down any documentation about InferInput that makes it clear.

Thanks for any suggestions you can offer! I'd also welcome ideas on how to get it to properly digest string-type image data and abandon the TYPE_UINT8 approach.

JanuszL commented 5 months ago

Hi @mvpel,

Thank you for reaching out. Please check this section of the readme, and this part of the DALI + TRITON tutorial.

mvpel commented 5 months ago

Hi @mvpel,

Thank you for reaching out. Please check this section of the readme, and this part of the DALI + TRITON tutorial.

Thanks, I appreciate the references! I wasn't sure that zero-padding the end of the JPG image data wouldn't trip something up, evidently that's not the case. This bit of code from the ensemble client is helpful, it's a very elegant way to zero-pad a list of arrays - kudos to the author:

def array_from_list(arrays):
    """
    Convert list of ndarrays to single ndarray with ndims+=1
    """
    lengths = list(map(lambda x, arr=arrays: arr[x].shape[0], [x for x in range(len(arrays))]))
    max_len = max(lengths)
    arrays = list(map(lambda arr, ml=max_len: np.pad(arr, ((0, ml - arr.shape[0]))), arrays))
    for arr in arrays:
        assert arr.shape == arrays[0].shape, "Arrays must have the same shape"
    return np.stack(arrays)

The dynamic batching capability looks interesting, but it looks like it wouldn't give me any benefit in my current setup of a single stream of requests going to the server from a single client process without async mode, i.e., "result = triton_client.infer(...)" so I'll stick with the client-side batching.

Thanks again for your help!

JanuszL commented 5 months ago

Happy to help.