triton-inference-server / dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
MIT License
118 stars 28 forks source link

Connecting InputOperator with no explicit inputs to Triton #209

Closed fversaci closed 9 months ago

fversaci commented 10 months ago

Hi,

I'm attempting to integrate our custom C++ Cassandra-DALI plugin with Triton. The plugin is built on the InputOperator class and does not have any explicit input streams. Instead, it reads its inputs, which consist of pairs of uint64 representing UUIDs, through the pipe.feed_input function. Is it feasible to also provide inputs to it through Triton?

Our current code for connecting the plugin to Triton is available at this link: https://github.com/fversaci/cassandra-dali-plugin/tree/triton/examples/triton

and can be tested using the following commands:

git clone https://github.com/fversaci/cassandra-dali-plugin.git -b triton
cd cassandra-dali-plugin
docker build -t cassandra-dali-plugin -f Dockerfile.triton .   # this might take some time
docker run --cap-add=sys_admin --rm -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --name cass-dali cassandra-dali-plugin
# within the container
./runme.sh

Thanks!

banasraf commented 10 months ago

Hi @fversaci We didn't test our backend with any input operator other than external source and video input (only those are available in DALI) but, skimming over the code, I don't see why shouldn't it work. When you run the example with the configuration file (which seems to be correct) you have at the moment (with the input config uncommented) does any problem appear?

fversaci commented 10 months ago

Hi @banasraf,

Thanks for your response!

In our current code, the Cassandra-DALI plugin is correctly recognized and listed by the Triton server. However, if I uncomment the input config, I encounter the following error: failed to load 'dali' version 1: Invalid argument: Configuration file contains config for READER[0] but such input is not present in the pipeline.

May this error be due to the plugin not having an explicit input, but instead reading data through the pipe.feed_input function?

I'd like to know if Triton supports InputOperators with no explicit inputs and, if it does, I'd appreciate guidance on how to configure and provide input for such operators. Is there an example available demonstrating their use?

fversaci commented 10 months ago

Hi @banasraf,

I have an update: I realized that I was using the wrong name for the operator, READER[0] instead of Reader... Now the program recognizes the input stream correctly!

I will now try to pass some inputs to it and I will get back to you with the results soon. Thank you!

banasraf commented 10 months ago

@fversaci

Ok, so I investigated it a bit and I see that in our current approach in DALI backend this input operator won't work. The problematic part is that the operator has two outputs which is a scenario that we don't support with neither the external source operator nor the video input.

I will schedule a task to solve this issue. For now, the only feasible workaround I see would be to transform the operator to not be an input op. Correct me if I'm wrong, but it seems that the operator could just take the uuids as input (which would be provided by the regular external source).

fversaci commented 10 months ago

Hi @banasraf

Thanks a lot for adding a task to address this problem.

The reason we don't read the uuids as a standard input is briefly discussed in this issue.

In summary, we want to maintain two important features of our data loader:

  1. We want to be able to freely feed uuids during execution (which, as you suggested, could also be achieved using an external source).
  2. We need to be able to internally prefetch images from the database for performance reasons during ML training.

During inference, we could discard the internal prefetching (which may not even work in this scenario) and use an external source when interfacing with Triton. However, it would be better to maintain a single version of the data loader instead of having two separate versions for training and inference.

Regarding this, would it be possible to somehow connect/short-circuit a standard external source operator to our InputOperator? This might allow us to keep the current format of the operator while ensuring compatibility with the current Triton interface.

banasraf commented 9 months ago

@fversaci

Sorry for the delayed response. In the meantime I worked on the issue with multi-output input operators - #5066 With this change, your input operator should work in DALI Triton backend as expected.

Unfortunately it might take quite some time before this change is available in the upstream Triton container, but I'll keep you updated when it's available in DALI nightly so you can test it.

fversaci commented 9 months ago

Hi @banasraf

Thank you, that's fantastic news!

We have recently made some changes to the internal prefetching of our plugin, so that it should now function seamlessly even when performing inference through Triton.

I'm really excited to see if it works smoothly and what kind of performance we can achieve. I will wait for the nightly version to give it a try. Thank you once again!

fversaci commented 9 months ago

Hi @banasraf

I was able to test your fix using DALI nightly 1.32.0.dev20231011. However, I encountered the following error:

tritonclient.utils.InferenceServerException: [400] Exception: DALI internal error: "depleted" trace not found for input "Reader". It must be defined by all input operators.

I'm not sure if this error is caused by DALI or if it is due to some misconfiguration on my part. Do you have any idea?

Our current code is still available at this link: https://github.com/fversaci/cassandra-dali-plugin/tree/triton/examples/triton

Here are the steps to reproduce the error:

git clone https://github.com/fversaci/cassandra-dali-plugin.git -b triton
cd cassandra-dali-plugin
docker build -t cassandra-dali-plugin -f Dockerfile.triton .   # this might take some time
docker run --cap-add=sys_admin --rm -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --name cass-dali cassandra-dali-plugin
# within the container
./start-and-fill-db.sh
./start-triton.sh   # don't close the container
# new shell within the host
docker exec -ti cass-dali fish
# within the container
python3 client-triton.py

Thanks!

banasraf commented 9 months ago

Hey, @fversaci, thanks for testing the nightly. I'll reproduce the error and will come back to you when I have some info

banasraf commented 9 months ago

@fversaci I've found the issue. Input operators should use the traces mechanism to signal the triton backend when the data source is depleted. It's important for operators that can generate multiple iterations from single data feed (e.g. video input). In the usual case (feed once -> output once) like yours, this maybe shouldn't be required and I might lift that check, but for now, you can fix the operator by adding:

void Cassandra::RunImpl(dali::Workspace &ws) {
  // RunImpl code
  SetDepletedOperatorTrace(ws, !HasDataInQueue()); // this line
}
fversaci commented 9 months ago

Thanks @banasraf ! I'll try your fix and get back to you soon.

fversaci commented 9 months ago

Hi @banasraf

I have added the missing SetDepletedOperatorTrace line, and now everything is working smoothly. I will now focus on enabling dynamic batching...

Thanks for all your support!