pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.99k stars 6.92k forks source link

Feedback on Video APIs #7438

Open bjuncek opened 1 year ago

bjuncek commented 1 year ago

Feedback request

With torchaudio's recent success in getting a clean FFMPEG build with a full support for FFMPEG 5 and 6 (something we can't replicate in torchvision easily yet), we are thinking of adopting their API and joining efforts to have a better support for video reading.

With that in mind, we were hoping to gather a some feedback from TV users who rely on video reader (or would like to use it but find it hard to do so):

  1. What are your main pain points with our current API?
  2. What do you wish was supported?
  3. What are the most important features of a video IO for you?

We can't promise we'll support everything (of course), but we'd love to gather as much feedback as possible and get as much of it incorporated as possible.

vadimkantorov commented 1 year ago

Related:

v-iashin commented 1 year ago

I am glad you are gathering feedback on the video reader. I think we should improve the state of Video IO in our community and have more solutions that work out of the box. Yet, installation issues are always in the way.

I am a researcher. I work with both audio and RGB streams and release my research code to the public. I work with Linux and conda as a virtual env manager on SLURM clusters and local machines.

What are your main pain points with our current API?

What do you wish was supported?

  • Perhaps, a combination of both APIs should do as well as the absence of installation issues (ie conda install -c pytorch torchvision would be enough). I am not sure if it is even possible.

What are the most important features of a video IO for you?

  • fast fine-grained read of both audio and RGB streams + meta info about each.
  • I don't want to specify the source height and weight (as it is required in Decord, god forbid).
  • RGB and audio, if read fine-grained, should be synchronized. Ideally, for each RGB frame I want to get the corresponding number of audio frames, yet with VideoReader it is something like 1024 frames or so per iteration which is temporally longer than one RGB frame.
  • I want to specify to read only one of the streams (only audio or only RGB). I think I saw a PR adding this to read_video but it is not added to main yet.

Sorry for the lack of specific issues and errors that I got back then. The content is mostly derived from past experience with video IO.

leopck commented 12 months ago

I just saw this post on the feedback, though it's a dated post, but I hope this feedback would still be useful somehow.

What are your main pain points with our current API?

My team and I, and several others around my company and my customers are using VideoReader API mostly due to its' simplicity and the fine-grain nature of it. And it does the work really well in our case.

Even though VideoReader API is in beta, but it seems to serve us pretty well and we have done several optimizations to VideoReader as well which we are intending to upstream to this community but I would like confirmation on where should it land?

The issues revolving the licensing and not building ffmpeg into the Torchvision this is easily solvable by dynamically linking into ffmpeg.

What do you wish was supported?

Encoding API endpoint

What are the most important features of a video IO for you?

bhack commented 6 months ago

Check also: https://github.com/PyAV-Org/PyAV/discussions/1276

pedromoraesh commented 4 months ago

I would like to see support to RTSP streams, currently it already support RTMP but i do need RTSP also.