Feedback on Video APIs - Githubissues

bjuncek commented 1 year ago

Feedback request

With torchaudio's recent success in getting a clean FFMPEG build with a full support for FFMPEG 5 and 6 (something we can't replicate in torchvision easily yet), we are thinking of adopting their API and joining efforts to have a better support for video reading.

With that in mind, we were hoping to gather a some feedback from TV users who rely on video reader (or would like to use it but find it hard to do so):

What are your main pain points with our current API?
What do you wish was supported?
What are the most important features of a video IO for you?

We can't promise we'll support everything (of course), but we'd love to gather as much feedback as possible and get as much of it incorporated as possible.

vadimkantorov commented 1 year ago

v-iashin commented 1 year ago

I am glad you are gathering feedback on the video reader. I think we should improve the state of Video IO in our community and have more solutions that work out of the box. Yet, installation issues are always in the way.

I am a researcher. I work with both audio and RGB streams and release my research code to the public. I work with Linux and conda as a virtual env manager on SLURM clusters and local machines.

What are your main pain points with our current API?

torchvision.io.VideoReader (backend 'video_reader') is a fine-grained and nice API where I can simply extract however many (RGB and audio) frames I want. Yet, I (and others) found it quite difficult to install. There is always some problem with version compatibility with other packages. Then, I fix those errors, and read_video no longer works or something. I don't remember exactly which ones (ffmpeg==4.2.0?) but I certainly experienced some issues there. The difficulty to install makes me hesitant to rely on it and share my environment with other people. I would be happy if TV could make the installation experience at least as smooth as torchvision.io.read_video (only extra PyAV installation is required atm). On a side note, I think I saw a comment in one of the issues in TV that stated that VideoReader is only slightly faster than read_video but in my experience, it was significantly faster.
torchvision.io.read_video. This is my workhorse and a sota in video IO. I read the mp4 files and it gives me rgb, audio, and meta. It is somewhat reliable but seems to require reading the whole video and lacks the finer-grained IO. Another problem is with installation as it requires PyAV to be installed additionally (via pip). True, it is easy to run pip install pyav but I couldn't make it work with the latest PyAV (9+) that pip installs because it failed with some weird error during training when the loop switches from train to valid (what?) so I had to degrade it to 8.1.0. Thus, I wish it would be easier to be installed and have finer IO API.

What do you wish was supported?

Perhaps, a combination of both APIs should do as well as the absence of installation issues (ie conda install -c pytorch torchvision would be enough). I am not sure if it is even possible.

What are the most important features of a video IO for you?

fast fine-grained read of both audio and RGB streams + meta info about each.

I don't want to specify the source height and weight (as it is required in Decord, god forbid).

RGB and audio, if read fine-grained, should be synchronized. Ideally, for each RGB frame I want to get the corresponding number of audio frames, yet with VideoReader it is something like 1024 frames or so per iteration which is temporally longer than one RGB frame.

I want to specify to read only one of the streams (only audio or only RGB). I think I saw a PR adding this to read_video but it is not added to main yet.

Sorry for the lack of specific issues and errors that I got back then. The content is mostly derived from past experience with video IO.

leopck commented 12 months ago

I just saw this post on the feedback, though it's a dated post, but I hope this feedback would still be useful somehow.

What are your main pain points with our current API?

My team and I, and several others around my company and my customers are using VideoReader API mostly due to its' simplicity and the fine-grain nature of it. And it does the work really well in our case.

Even though VideoReader API is in beta, but it seems to serve us pretty well and we have done several optimizations to VideoReader as well which we are intending to upstream to this community but I would like confirmation on where should it land?

The issues revolving the licensing and not building ffmpeg into the Torchvision this is easily solvable by dynamically linking into ffmpeg.

What do you wish was supported?

Encoding API endpoint

What are the most important features of a video IO for you?

video IO needs to support both audio and video
video IO should maintain its' simplicity nature and not passing in tons of parameters just to get a simple frame or audio
Support for DataLoader for sure
Support for vendor's optimization integration such as Intel

bhack commented 6 months ago

Check also: https://github.com/PyAV-Org/PyAV/discussions/1276

pedromoraesh commented 4 months ago

I would like to see support to RTSP streams, currently it already support RTMP but i do need RTSP also.

pytorch / vision

Feedback on Video APIs #7438

Feedback request