triton-inference-server / dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
MIT License
123 stars 29 forks source link

layout parameter to external_source causes assert error #156

Closed damonmaria closed 2 years ago

damonmaria commented 2 years ago

I pass numpy arrays from my client code into Triton through a DALI pipeline. The following DALI pipeline worked in older versions of Triton/DALI but now throws an assertion error:

@pipeline_def(batch_size=256, num_threads=4, device_id=0)
def pipeline():
    images = fn.external_source(device="cpu", name="DALI_INPUT_0", layout="HWC")
    ...

When Triton starts I get the following error:

E1015 07:50:36.366937 1 dali_model_instance.cc:43] [/opt/dali/dali/pipeline/operator/builtin/external_source.h:567] Assert on "layout_ == batch.GetLayout()" failed: Expected data with layout: "HWC" and got: "".
Stacktrace (21 entries):
[frame 0]: /opt/tritonserver/backends/dali/dali/libdali.so(+0x869b2) [0x7efe0f3e89b2]
[frame 1]: /opt/tritonserver/backends/dali/dali/libdali.so(+0x1e8f98) [0x7efe0f54af98]
[frame 2]: /opt/tritonserver/backends/dali/dali/libdali.so(void dali::Pipeline::SetDataSourceHelper<dali::TensorList<dali::CPUBackend>, dali::CPUBackend>(std::string const&, dali::TensorList<dali::CPUBackend> const&, dali::OperatorBase*, dali::AccessOrder, dali::ExtSrcSettingMode)+0x94) [0x7efe0f5554a4]
[frame 3]: /opt/tritonserver/backends/dali/dali/libdali.so(void dali::Pipeline::SetExternalInputHelper<dali::TensorList<dali::CPUBackend> >(std::string const&, dali::TensorList<dali::CPUBackend> const&, dali::AccessOrder, dali::ExtSrcSettingMode)+0x107) [0x7efe0f55ab97]
[frame 4]: /opt/tritonserver/backends/dali/dali/libdali.so(daliSetExternalInputAsync+0xe8f) [0x7efe0f54705f]
[frame 5]: /opt/tritonserver/backends/dali/dali/libdali.so(daliSetExternalInput+0x1d) [0x7efe0f54764d]
[frame 6]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x3df29) [0x7efe684c5f29]
[frame 7]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x3df8c) [0x7efe684c5f8c]
[frame 8]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x3e202) [0x7efe684c6202]
[frame 9]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x3bbe0) [0x7efe684c3be0]
[frame 10]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x3becc) [0x7efe684c3ecc]
[frame 11]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x2ad05) [0x7efe684b2d05]
[frame 12]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x2b4a6) [0x7efe684b34a6]
[frame 13]: /opt/tritonserver/backends/dali/libtriton_dali.so(TRITONBACKEND_ModelInstanceExecute+0x190) [0x7efe684a0e00]
[frame 14]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x11290a) [0x7efe7585990a]
[frame 15]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1130b7) [0x7efe7585a0b7]
[frame 16]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1e5541) [0x7efe7592c541]
[frame 17]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10d287) [0x7efe75854287]
[frame 18]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4) [0x7efe75394de4]
[frame 19]: /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7efe7670d609]
[frame 20]: /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7efe7507f133]

Currently I have to workaround this by setting the layout afterwards in a reinterpret:

@pipeline_def(batch_size=256, num_threads=4, device_id=0)
def pipeline():
    images = fn.external_source(device="cpu", name="DALI_INPUT_0")
    images = fn.reinterpret(images, layout="HWC")
    ...

The DALI pipeline was serialized using the NGC Pytorch container 22.09. And I'm trying to load it in Triton Server container 22.09. I am not sure what version I was using when this was working.

banasraf commented 2 years ago

Hi @damonmaria It looks like it might be a bug. I will investigate it further

banasraf commented 2 years ago

@damonmaria Thanks for reporting the issue! I confirm it's a bug. I posted a fix: #158 This fix will be a part of 22.11 version of ngc container.