Open qihang720 opened 12 months ago
Hi @qihang720,
Can you provide more details about the environment you are using to run your tests? Can you reproduce a similar number using DALI as a standalone library? Just recently we introduced a couple of optimizations to crop mirror normalize so updating the TRITON version to the latest can be a good first step to confirm if the use case you have has improved.
Hi @qihang720,
Can you provide more details about the environment you are using to run your tests? Can you reproduce a similar number using DALI as a standalone library? Just recently we introduced a couple of optimizations to crop mirror normalize so updating the TRITON version to the latest can be a good first step to confirm if the use case you have has improved.
I used nvcr.io/nvidia/tritonserver:23.05-py3 as my working environment. dali version : nvidia-dali-cuda110 1.29.0
I'm not sure how to profile DALI alone,because every input images' length is different. Triton can help me to batch dynamic shape when I add ragged_batches option.
I will test in new version lately.
Hi @qihang720,
nvcr.io/nvidia/tritonserver:23.05-py3
uses DALI 1.25. In DALI 1.30 we did a couple of optimizations for the crop_mirror_normalize operator. Please stay tuned for TRITON 23.10 which should include this DALI version.
Also, the biggest gain from the GPU processing is visible when you process a batch of data. Do you see similar results for bigger batches?
Hi @JanuszL,
Thanks for your advices, I will continue to follow TRITON 23.10.
For batch of data, if my every input is different, the base64 length is also different, so how can I batch it together.
output layout is CHW, using perf_analyzer to profile.
output layout is HWC
Most of time, model input layout is "NCHW", is there any way we can improve performance?