triton-inference-server / dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
MIT License
123 stars 29 forks source link

Using custom DALI plugins with Triton #138

Closed leanrdchen00918 closed 2 years ago

leanrdchen00918 commented 2 years ago

Hi! I'm trying to use custom DALI plugins with Triton and I've followed the guide in dali_backend/docs/examples/dali_plugin/README.md, but somehow it doesn't work. Below are the two errors encountered that really confuse me.

The first one looks like this: 2022-07-04 10-35-14 的屏幕截图 I've noticed the mismatch between the version of DALI I'm using(1.15) and the version of 'dali' TRITONBACKEND API(1.10). Is this really a problem? And if so, how can I fix it?

The second one looks like this: 2022-07-04 10-41-21 的屏幕截图 I think it is the same problem as the one mentioned in the guide. However, I've double checked my backend configuration and I think it's fine. The plugin library locates at /model_repo/libcustom_operation.so, and the command I use to start the triton server is sudo docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/yychen/Work/triton/model_repo/:/models nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models --backend-config dali,plugin_libs=/models/libcustom_operation.so

Thanks! Wish to have your help.

leanrdchen00918 commented 2 years ago

Problem solved! It turns out the version of libdali.so in tritonserver:22.06-py3 is 1.14. I have to switch my DALI version to 1.14(as dali_backend/DALI_VERSION points out) to serve my pipeline successfully. Is there a way to manage these versions in a more decent, like, automatic way?

szalpal commented 2 years ago

@leanrdchen00918 ,

Do I understand correctly, that your problem was that you were building Custom Operator with DALI 1.15 instead of 1.14?

The only convenient way of maintaining several DALI versions that comes to my mind is using virtualenv or conda environments.

Thank you for pointing out the potential problem with the plugin. We usually bump DALI version with every dali_backend release (every month or so), so the compatibility problem might arise. I'll try to investigate possible improvements in this field.

leanrdchen00918 commented 2 years ago

@szalpal You are right about my mistake. Thanks a lot for your advice!