pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.99k stars 6.92k forks source link

Provide full stack of video pipeline by FFMPEG-GPU or pyav to accelerate. #5810

Open wqh17101 opened 2 years ago

wqh17101 commented 2 years ago

🚀 The feature

  1. Provide the API for reading video by ffmpeg-gpu (or pyav) from not only the file but the video stream.
  2. Provide the API for encoding frames by ffmpeg-gpu (or pyav) from not only the file but the video stream.
  3. Provide the API for writing video by ffmpeg-gpu (or pyav) from not only the file but the video stream.

Motivation, pitch

As a video worker, output and encoding is the same important thing as input and decoding. I am very happy to see you to add the video functions. In the future , I hope i can use vision to finish the pipeline: input -> decode->inference->encode>output.

Alternatives

No response

Additional context

No response

bjuncek commented 2 years ago

Hi @wqh17101 thanks for the interest. Unfortunately, we're not really looking to add new functionality to our video-IO until the solid roadmap for it is established (and there exists a scenario where we abandon it altogether, in favour of adopting pyAV or similar solution).

Having said that, I had two questions to better understand the feature request:

  1. Our current "read_video" API should be able to handle streams (as long as it's digestible by pyAV or ffmpeg) if I am not mistaken: python api simply takes whatever is given as a file name to the av.open call (see here), and C++ backend is built specifically for handling pointers (see here).

2/3 Am I correct in saying that this API already exists in here. We can take a video tensor and encode it (given a codec) to a file. Is this not what you are looking for? Having said this, this does seem a bit out of scope for torchvision's focus.

wqh17101 commented 2 years ago

I see it. And i will try. Another question:

  1. pyav does not support output and input from pipe,ffmpeg supports that. But its impl may be different from the normal one,please check it . I hope you can support it.