triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.12k stars 1.46k forks source link

[feature request] ffmpeg backend for simplifying decoding of audio/video inputs #7629

Open vadimkantorov opened 1 week ago

vadimkantorov commented 1 week ago

https://github.com/triton-inference-server/dali_backend/ is awesome for reading and preprocessing images

It would be nice to have a more developed builtin solution for decoding audio/videos.

Currently, of course one can do audio/video decoding in Python backend invoking ffmpeg libraries under the hood, but if we want to process very long audio or video, it might be nice to have a proper streaming capability (to start executing the models without waiting for full decoding of the whole input file)

Another question is failure / SEGFAULT handling. audio/video decoders like ffmpeg can have nasty bugs and crashes, so it is nice to have some answers about reliability and automatic crash recovery (and also questions on process/memory/cgroups isolation if there are any RCE bugs in decoders).

Another useful feature is to limit the max resources (compute time, memory, etc) being used by decoder to protect oneself from "zipbombs" or again some RCE/code execution bugs in parsers/decoders.

So having a single, well-tested solution as a core backend might be beneficial to many

oandreeva-nv commented 3 days ago

I believe DALI should also be helpful for audio and video data. @szalpal , could you please recommend something?