How to get list of image paths into dali pipeline?

Skier23 commented 3 months ago

I'm looking to do something like this:

@pipeline_def(batch_size=1, num_threads=4, device_id=0)
def custom_pipeline():
    # Triton will provide the input through "DALI_INPUT_0"
    # Here, we expect image paths
    image_paths = fn.external_source(device="cpu", name="DALI_INPUT_0")
    # Load and decode the images
    images = fn.readers.file(file_root="", files=image_paths, device="cpu")
    images = fn.decoders.image(images, device="mixed", output_type=types.RGB)
    # Resize to 384x384 using bicubic interpolation
    images = fn.resize(images, resize_x=384, resize_y=384, interp_type=types.INTERP_CUBIC)
    # Normalize
    images = fn.crop_mirror_normalize(
        images,
        dtype=types.FLOAT,
        mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
        std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
        output_layout="CHW")
    return images

However, this code has an error: The argument files for operator File should not be a DataNode but a str or list of str

This seems to be because fn.readers.file doesn't support a DataNode which is returned by external_source. So in this case, how would I get the underlying list of strings that external_source contains to fn.readers.file so it can read in all those images?

banasraf commented 3 months ago

Hey @Skier23 Thanks for the question. Indeed it's not possible to provide a filepath to the file reader during runtime, it needs to be provided on the build stage.

Could you tell more about the use-case you have? Why don't you want to just send the files' data instead of filepaths?

Skier23 commented 3 months ago

My usecase is this: I want to process videos and extract the frames every (for example 0.5s) from the video and then feed all those frames to my image classification model. However, it would probably be a huge bottleneck to feed the video directly to triton over grpc so I was thinking that sending over just the file path and letting DALI read the video file in and process the video from file itself would probably be the most optimized setup.

banasraf commented 3 months ago

@Skier23

Unfortunately it's not possible in DALI itself right now.

The only solution that comes to my mind is using another model to read the video files from disk. You can use a Python backend to run a script that would read file (without decoding it) and return it as output. It would be passed through an ensemble to the DALI model that can use video input or video decoder to decode video file from memory.

Here we have an example of using the video decoder in DALI backend: https://github.com/triton-inference-server/dali_backend/tree/main/docs/examples/video_decoding

And the video input: https://github.com/triton-inference-server/dali_backend/tree/main/docs/examples/video_decode_remap

Video input can be used to process the video file part by part (generating multiple responses for a single video file)

Skier23 commented 3 months ago

That's an approach I was wondering about. On that approach, would reading the whole video in (keeping it encoded) and passing it to the DALI pipeline be more overhead? Or in other words, if I were to read a video in directly from DALI and then I select frames from various timestamps in the video, would DALI load the entire video into memory and then decode it or only load data at the selected time stamps?

triton-inference-server / dali_backend

How to get list of image paths into dali pipeline? #238