Closed Skier23 closed 5 months ago
@Skier23 ,
could you attach a configuration for simple_ensemble
model?
Yea. It's basically just a straight passthrough to the dali model:
name: "simple_ensemble"
platform: "ensemble"
max_batch_size: 64
input [
{
name: "x"
data_type: TYPE_UINT8 # Encoded images are sent as strings
dims: [ -1 ]
}
]
output [
{
name: "classifier"
data_type: TYPE_FP16
dims: [ 36 ]
}
]
ensemble_scheduling {
step [
{
model_name: "dali_preprocessing"
model_version: -1
input_map {
key: "DALI_INPUT_0"
value: "x"
}
output_map {
key: "DALI_OUTPUT_0"
value: "preprocessed_image"
}
},
{
model_name: "maxvit_rmlp_base"
model_version: -1
input_map {
key: "x"
value: "preprocessed_image"
}
output_map {
key: "classifier"
value: "classifier"
}
}
]
}
max_batch_size: 64
input [
{
name: "x"
data_type: TYPE_UINT8 # Encoded images are sent as strings
dims: [ -1 ]
}
]
I believe this part of the config is the problem. The dims: [-1]
specifies the shape of the input sample. Additionally, the max_batch_size
option determines, that the batching is used, therefore Triton expects that the input will have 2 dimensions: [batch, sample]
. The error message points out, that you're passing one-dimensional input. Would you mind double-checking this?
Yea. Exactly that is the problem. But that's the exact same setup that the linked example/tutorial uses. I would've thought a one dimensional input from the client where the dimension is the batch size and includes the encoded string for each image in the batch would have made more sense personally. However it looks like the tutorial/example also expects a 2 dimensional input so I'm asking, if that is correct for a dali pipeline like in the example tutorial, how should the client code be formatted to give triton that data in the right format?
max_batch_size: 64 input [ { name: "x" data_type: TYPE_UINT8 # Encoded images are sent as strings dims: [ -1 ] } ]
I believe this part of the config is the problem. The
dims: [-1]
specifies the shape of the input sample. Additionally, themax_batch_size
option determines, that the batching is used, therefore Triton expects that the input will have 2 dimensions:[batch, sample]
. The error message points out, that you're passing one-dimensional input. Would you mind double-checking this?
Also, when trying with a batch size of 0 so that the input gives the 1 dimensional input, it gets another error:
[8bc16ad44c00:1 :0:145] Caught signal 8 (Floating point exception: integer divide by zero)
==== backtrace (tid: 145) ====
0 0x0000000000042520 __sigaction() ???:0
1 0x0000000000043d8d triton::backend::dali::DaliPipeline::SetInput() :0
2 0x00000000000440d5 triton::backend::dali::DaliPipeline::SetInput() :0
3 0x0000000000040512 triton::backend::dali::DaliExecutor::SetupInputs() :0
4 0x0000000000040ebe triton::backend::dali::DaliExecutor::Run() :0
5 0x000000000002de64 triton::backend::dali::DaliModelInstance::ProcessRequest() :0
6 0x000000000002e47a triton::backend::dali::DaliModelInstance::ExecuteUnbatched() :0
7 0x000000000001a960 TRITONBACKEND_ModelInstanceExecute() ???:0
8 0x00000000001a8d74 triton::core::TritonModelInstance::Execute() :0
9 0x00000000001a90db triton::core::TritonModelInstance::Schedule() :0
10 0x00000000002bd9bd triton::core::Payload::Execute() :0
11 0x00000000001acd64 triton::core::TritonModelInstance::TritonBackendThread::BackendThread() :0
12 0x00000000000dc253 std::error_code::default_error_condition() ???:0
13 0x0000000000094ac3 pthread_condattr_setpshared() ???:0
14 0x0000000000125a04 clone() ???:0
This leads me to believe that this likely isnt the correct input format.
@szalpal any ideas on the format the data needs to be on the client side when sent to triton in the tutorial example?
@banasraf Any ideas on how the format of the data should be from the clientside when sent to triton for a DALI pipeline that expects encoded images?
@Skier23
The client should send a 2-dimensional tensor where the first dimension is a batch dimension. We do not support the BYTES type as an input right now. Binary data should be sent as UINT8 tensors. This means that all the files that you send in a batch need be of equal length (as you send them as rows in a a tensor). You can pad the data with 0s at the end of each file to make them of equal sizes.
Thanks for the reply! In a productionalized environment (as I'd classify DALI with Triton), you wouldn't usually have the same input sizes for all the images. If I were to pad the inputs to some fixed size, I'm not sure how to know what that size would be. The other alternative would be to resize the images on the client side. But then that takes out part of the few steps I would want to do in the DALI pipeline: resizing, normalizing, and setting the data type. If we're already doing the resizing on the client side it seems like we're losing some of the performance gain. Perhaps the alternative could be running the whole DALI pipeline on the clientside and sending the images to the server (or perhaps encoding on the client and then just decoding on the server side). But even in this pattern, the inputs to the clientside DALI pipeline would also be images with non-consistent sizes so I guess I'm struggling to see how an optimized flow might work?
In a productionalized environment (as I'd classify DALI with Triton), you wouldn't usually have the same input sizes for all the images.
That's true, but when you compose a batch to be sent to Triton as a single request you have the access to all the images that you want to send, so you can pad them to the size of the biggest one. And the requirement of uniform size applies only to images sent in a single batch, so there's no need to track any size in between the requests.
Gotcha. That does make sense. Regarding the above options, which pattern would you generally think would be optimal:
I'd imagine in 1, the payloads to triton would be medium size from resizing but still decently large because they aren't encoded With 2, the payload would be the smallest but we have a bit of excessive decoding, re-encoding, and decoding again and With 3 we have a bit of extra payload due to images potentially a good bit larger than the resized size but the images are encoded.
You could try different options and compare them because cost of specific setups might depend on you environment.
Option 3 would be the one that we usually propose. Generally, encoding images (e.g. to jpegs) reduces their size dramatically, so usually sending decoded data adds a lot of communication overhead. But, if you resize them down to dimensions that are much smaller than the original image, they might not impose that much of overhead.
I would rather avoid decoding, encoding and then decoding it again on the server. This is a costly operation that tends to dominate the cost of the whole preprocessing pipeline, so multiplying this work might be too expensive.
You could try different options and compare them because cost of specific setups might depend on you environment.
Option 3 would be the one that we usually propose. Generally, encoding images (e.g. to jpegs) reduces their size dramatically, so usually sending decoded data adds a lot of communication overhead. But, if you resize them down to dimensions that are much smaller than the original image, they might not impose that much of overhead.
I would rather avoid decoding, encoding and then decoding it again on the server. This is a costly operation that tends to dominate the cost of the whole preprocessing pipeline, so multiplying this work might be too expensive.
That makes sense. I'll start with trying option 3 and go from there. Thanks for the help!
That did end up working. I should have enough here to try out some different approaches. Thanks!
I'm trying to make a dali preprocessing pipeline extremely similar to the inception example but I can't seem to figure out how to get the correct format on the client side. Here is the code I have now on the client side:
However this code doesnt work with the dali pipeline config file from the example: https://developer.nvidia.com/blog/accelerating-inference-with-triton-inference-server-and-dali/
because triton is expecting input with a shape [-1, -1] but its getting data in the format [ -1 ].
unexpected shape for input 'x' for model 'simple_ensemble'. Expected [-1,-1], got [4].
What would be the correct way to format the data on the client side?