[feature request] ffmpeg backend for simplifying decoding of audio/video inputs

https://github.com/triton-inference-server/dali_backend/ is awesome for reading and preprocessing images

It would be nice to have a more developed builtin solution for decoding audio/videos.

Currently, of course one can do audio/video decoding in Python backend invoking ffmpeg libraries under the hood, but if we want to process very long audio or video, it might be nice to have a proper streaming capability (to start executing the models without waiting for full decoding of the whole input file)

Another question is failure / SEGFAULT handling. audio/video decoders like ffmpeg can have nasty bugs and crashes, so it is nice to have some answers about reliability and automatic crash recovery (and also questions on process/memory/cgroups isolation if there are any RCE bugs in decoders).

Another useful feature is to limit the max resources (compute time, memory, etc) being used by decoder to protect oneself from "zipbombs" or again some RCE/code execution bugs in parsers/decoders.

So having a single, well-tested solution as a core backend might be beneficial to many

triton-inference-server / server

[feature request] ffmpeg backend for simplifying decoding of audio/video inputs #7629