opencv / opencv

Open Source Computer Vision Library
https://opencv.org
Apache License 2.0
75.95k stars 55.62k forks source link

VideoCapture::set (CAP_PROP_POS_FRAMES, frameNumber) not exact in opencv 3.2 with ffmpeg #9053

Open nji9nji9 opened 6 years ago

nji9nji9 commented 6 years ago

Example and confirmed effect:

http://answers.opencv.org/question/162781/videocaptureset-cap_prop_pos_frames-framenumber-not-exact-in-opencv-32-with-ffmpeg/

saskatchewancatch commented 6 years ago

@nji9nji9 The "category:highui-video" category label captures the video input/output functionality as such functionality has come to be associated with highui ... since at least 2.4. See here.

"category:video" is for video processing functionality (not input/output).

I believe maintainer has chosen the appropriate label.

Hope this clarifies things.

nji9nji9 commented 6 years ago

Some more info: Depending on the format type (codec) the mis-positioning is quite different. MP4/AVC1: from 8 to -3 frames. (Negative values = seeks BEHIND desired pos.) WMV3/VC-1 (VBR): from 33 (!) to 0 (all positive), WMV3/VC-1 (CBR): 0 always. For the last two ones the same codec is used. I found several questions about this problem in the different forums. All with no solution. It seems the way it is now the positioning in a movie file is completely useless - moreover "dangerous" when you rely that you get what you want. Maybe this issued could be labelled as "important"?

nji9nji9 commented 6 years ago

Found another issue with that bug. Solved or not solved? What is happening here?

nji9nji9 commented 6 years ago

What can be done to prevent to close this issue after merging the above commit? As it doesn't solve the problem. Still I don't understand why knowbody seems to care about the bug. It is confirmed by several posters and to my opinion it's quite serious. As in a situation that's common (read-in a media at positions) you get the wrong frames WITHOUT ANY NOTICE. Opencv has very much sophisticated algs to process the data, but if reading stored data it gets wrong data. Why does knowbody care? Or is there really knowbody that has the knowledge? (BTW. I don't have the knowledge. But I built a terrible index-fighting workaround for the bug).

Britefury commented 6 years ago

Hi, I have encountered this issue also. What is your work-around?

nji9nji9 commented 6 years ago

Hi. Well it is literally a "work-around" - and not nice at all.

If I need only very few frames I start at 0 and continue to the desired position. Which is quite slow, but produces correct frames.

If I need more or all frames I process the movie multithreaded with (overlapping) parts, computing a cheap feature for each frame. Then fumble together the parts based on the value of the feature (shifting +/-). To get specific frames exactly now I request an interval around and match the feature (shifting +/-). This works well - with one exception (already mentioned in the opencv forum): the multihread reading-in will miss the last few frames, if the opencv mispositioning is too early. As then the processing by opencv will stop as it "thinks" it is at the end already. ... And all this because nobody cares about the buggy code. I wonder in how much software this mispositioning is NOT RECOGNIZED and leads without knowing to wrong results. Seems the professionals are retired, and the youngster are "developping" apps. ;-)

Sharan123 commented 6 years ago

OpenCV 3.3 With TBB ,LIBV4L, FFMPEG have the same bug. Seems that webm is failing to even produce the FOURCC Value as well as failing to give the same frame.

Avi FOURCC MJPEG is OK avi FOURCC MP42 is Not ok and failing i can give the source code i used for testing if it means anything.

I'm puzzled how this error isn't a higher priority. This actually forced users to iterate and count the whole video just to be sure they dragged out a valid frame. Otherwise results are corrupted

nji9nji9 commented 6 years ago

To my opinion it would be best to remove the whole code from opencv. Better no code than buggy code whose results will be overseen by the most. Also by removing maybe someone will wake up. The way it is now (since some time btw) it is quite irresponsible.

alalek commented 6 years ago

OpenCV highly uses FFmpeg (in case of FFmpeg backend). But FFmpeg itself doesn't work well with seeking on non "key" frames. There are many workarounds, but they are not very reliable. Sometimes seeking code works, sometimes doesn't. If your have worked FFmpeg's code with accurate seeking, then we could try to integrate it into OpenCV.

Consider to extract frames from video (into .png files) and use them.

Main purpose of OpenCV library is not highly correlated with goals of Media I/O library. Video I/O and Image I/O things added into OpenCV for demo purposes initially. We require only these things from Video I/O backends:

nji9nji9 commented 6 years ago

OK, let's take it as a fact that FFmpeg is the cause for the wrong function. (And not do that ping-pong ;-) Would you agree that is (or might be) quite dangerous that there is a function that gives results that are erraneous (sometimes)? Because you are not aware that you get the wrong frames (or fourcc, see comment above). Just thing about medical applications (!!). Considering the requirements that openCV has to backends I definetely suggest to disable/ remove the calling code we know that is buggy. (That is VideoCapture::set () with parameter CAP_PROP_POS_FRAMES, and maybe also the get-method in the bug Sharan123 descibed above.) By this the users of openCV would be noticed that a method they used so far is not reliable. And only add it again when FFmpeg made it's homework. I really think this should be done. For the best. Actually I am not the one to know enough of openCV details to change its code myself.

alalek commented 6 years ago

be noticed

I believe we could add some notification message for users about using of optional (and actually untested) features.

medical applications

Tests; many tests; tests for anything including tests for tests.

nji9nji9 commented 6 years ago

I think an unspecific notice somewhere in the openCV docu would not be ... ahem ... noticed ... by anyone.

Tests. Who would deny that they are essential ;-)

But this used FFmpeg code is not untested, but tested and shown as buggy. Therefore I think (as we know about that bug) we have the responsibility to force a notice when the buggy code is used specifically (maybe with a kind of #pragma ... ==> compiler error message?).

When there will be no reaction from the openCV users - good, nobody affected. When there will be a shit storm: 99% of them will be baffled, when they get the explanation, and will code their work-around then. And maybe one or two will provide correct FFmpeg code.

nji9nji9 commented 6 years ago

To state it explicite:

  1. OpenCV cannot be used for offline analysis of movies.

  2. Not only it delivers wrong frames but without any notice.

What makes it even worse:

  1. Responsible maintainer do rate this as low priority bug.

Can't imagine a more neglected (sad) situation.

Britefury commented 6 years ago

I am very much in favour of warning developers about these problems. Perhaps print a warning to the console when using VideoCapture::set with CAP_PROP_POS_FRAMES. Or have it fail with an error message.

It's quite broken. I have used VideoCapture::set with CAP_PROP_POS_FRAMES to attempt to seek to the start of the video (frame 0) and had it seek to 15 or 30 frames in instead. I have had to go through my code and replace all references to CAP_PROP_POS_FRAMES with CAP_PROP_POS_MSEC just to seek to the start of a video. Frame seeking is not really usable IMHO.

A potential upshot of making these problems more obvious is that it would potentially add some motivation for someone with the relevant video codec skills to develop a fix. :)

cesarandreslopez commented 5 years ago

So this bit me too and after several hours of trying I concluded, just as mentioned in this thread, that both VideoCapture::set with CAP_PROP_POS_FRAMES and VideoCapture::set with CAP_PROP_POS_MSEC are completely unreliable.

I can also state that FFmpeg will not have this problem. Frames seeked and grabbed from FFmpeg will always return the correct one.

I ended up having to rewrite all frame captures to grab from FFmpeg and then importing the still image into cv2 so I could then send it to the video writer object in cv2 as needed.

As pointed out by @nji9nji9 this issue only happens when VideoCapture::set is used. This issue does not happen when all frames are iterated from the beginning of the video file.

It seems to me that the problem is that VideoCapture ignores all duplicate frames, while FFmpeg will return duplicates without issues. In consequence, when calling VideoCapture::set with CAP_PROP_POS_FRAMES you will get a frame offset by something approximate (or likely equal) to the number of duplicate frames in the video file, or the number of duplicate frames in video file up to the frame you've tried to set to..

If this is right, then, resolving this bug should therefore just be a matter of having VideoCapture::set not ignore duplicate frames in the video.

Just a couple of thoughts, if useful at all

alex-liuziming commented 5 years ago

To state it explicite:

  1. OpenCV cannot be used for offline analysis of movies.
  2. Not only it delivers wrong frames but without any notice.

What makes it even worse:

  1. Responsible maintainer do rate this as low priority bug.

Can't imagine a more neglected (sad) situation.

My gosh, totally right. To debug, I reviewed my code several times until I foundd this terrible bug here. :cry: How could such a bug been left in master branch??? And marked as low priority???

mhtrinh commented 5 years ago

I have the same problem ! This is the worst type of bug : random and completely silent !!

fidelechevarria commented 5 years ago

This bug should definitely not have low priority.

nji9nji9 commented 5 years ago

That's what I always say. It exhibits the incompetence of @alalek on this (Not meant as an offend).

First - and simplest - thing to do would be to disable VideoCapture::set (CAP_PROP_POS_FRAMES, frameNumber) in the main trunk. Probably at least a few dozens of (the new compiled) applications would notice that they have a big bug - since years. BTW: Not me, I don't use opencv since all that.

apatsekin commented 5 years ago

Two years old bug and still no attention or at least warning?

Robird commented 4 years ago

av_seek_frame(... , AVSEEK_FLAG_BACKWARD)

Robird commented 4 years ago

nji9nji9 commented 4 years ago

In the name of the dozens (?) apps where this bug unnoted delivers wrong frames:

Thank you Robird!

Unfortunately I don't have the ability to assess if your workaround (It is a workaround, isn't it?) fits into the environment of opencv. (Actually I myself frustrated turned back to M$/ DirectShow as the bug wasn't noted). If your workaround works in all cases (does it?), to my opinion it should be into the main trunk. With some explanation (Why seek isn't reliable, where the bug probably is located in (ffmpeg?), why av_seek_frame/ grabFrame does the job etc.)

mhtrinh commented 4 years ago

I believe it is not just a OpenCV or ffmpeg problem : it's also the video codec issue. Video codec (eg x264) do not store frame number in their frame. Or at least it is not mandatory. It is an issue for video that do not have fix frame rate : how do you know what is the frame number at 5.36s when the frame rate could have changed mean while ? The only reliable way to get frame number is then by running from the beginning !

That is why I went away from referencing using frame number and use dts.

But it would still be really professional to have some sort of warning from OpenCV when compiling or when running, when someone is trying to seek using frame number !

nji9nji9 commented 4 years ago

But if it is a codec issue the effect should be in DirectShow too... See the referenced question in my starting post (There my last comment from July 24, 17.)

If you're true (and some results from the mentioned thread) then Robird's workaround wouldn't do the job.

nji9nji9 commented 4 years ago

I would like to try a sum-up so far (although far away of knowing much about that).

IF (!?)... the explanation for the wrong frame numbers is as mhtrinh said, then the bug indeed is not in opencv but in ffmpeg. It always returns a frame, but depending on the movie type, sometimes the wrong one. The error then is not ffmpeg's inability to return the correct frame, but that ffmpeg returns a frame at all in that case, without warning.

The cleanest solution seems to me that ffmpeg should return an error, if one trys to get a frame by number on a movie where this is not possible (for example: no fix frame rate, and frame number not stored in frame). This could be done by an "ability flag" (similar to bIsIndexable) that's default set to false. Enabling ffmpeg itself let do run from the beginning in that case doesn't seem a good idea to me. (As a decision for that time consuming task should be left to the user. Maybe he builds i.e a hash-table solution for that?).

Many "ifs" ... that should be investigated. And brought to ffmpeg then. (As said, I cannot do that, since I dropped opencv).

As long as this is not solved to my opinion grabbing frame by number should be disabled by opencv. The possible consequences for this silent error seem serious enough to me. Also this would put some pressure (in ffmpeg) to solve that issue. (Still I hardly can believe that in wide-spreaded ffmpeg there should be such bug ...)

abhiTronix commented 4 years ago

@nji9nji9 All this aside, Maybe you should try assist solving this problem and do less unnecessary ranting.

nji9nji9 commented 4 years ago

https://stackoverflow.com/questions/31854406/precise-seeking-with-ffmpeg?noredirect=1&lq=1 may be of some help. Still, there are lots of open questions ... that should be looked at by someone that is more capable in that than me.

nji9nji9 commented 4 years ago

@abhiTronix "Klatschmarsch für die Doofnuss!"

mhtrinh commented 4 years ago

If you look at the solutions in stackoverflow: in the end, they all seek by time/timestamp. I was using that approach until I stumble on those various frame rate videos from CCTV cameras, that need to deal with limited hardware and poor networking ...

nji9nji9 commented 4 years ago

Does that solution work for all formats but for those with various frame rate? (Why isn't it implemented in ffmpeg then?) If so, a workaround at application level would only have to be done for that formats. (Still it's my opinion that a method offered by a lib should work. Always.)

nji9nji9 commented 4 years ago

I'd like to point to another interesting thread: https://stackoverflow.com/questions/17546073/how-can-i-seek-to-frame-no-x-with-ffmpeg?rq=1 and there especially: https://github.com/FFMS/ffms2

My opinion: As ffmpeg's method obviously doesn't provide what its name promises (get frame no. n), the method should be renamed. And if ffms2 does the job (does it?), it should replace ffmpeg's method. If it won't be changed in ffmpeg (for some reason), opencv maybe uses ffms2 for seeking? Hopefully there will be someone who can do that.

mhtrinh commented 4 years ago

I made this video for testing purpose : https://www.dropbox.com/s/desjcazl3nkj9po/testVideo_20fpsTo100_10fpsTo200_30fpsTo300_20fpsTo400.mp4?dl=0

It is Various Frame Rate: 1-100 : 20 fps 100 - 200 : 10 fps 200 - 300 : 30 fps 300 - 400 : 20 fps

Each frame show the frame number for convenience.

Example of code that show how seek by frame is not working with OpenCV 3.2.0 :

#include <opencv2/opencv.hpp>

using namespace  cv;

void saveFrame(VideoCapture &capture,int frameIndex)
{
    Mat frame;
    char fn[500];
    capture.set(CAP_PROP_POS_FRAMES, frameIndex);
    capture>>frame;
    sprintf(fn,"frame%03d.png",frameIndex);
    imwrite(fn,frame);
}

void dumpFrames(VideoCapture &capture)
{
    Mat frame;
    int index=1;
    char fn[500];
    while (capture.read(frame))
    {
        sprintf(fn,"seq%03d.png",index);
        imwrite(fn,frame);
        index++;
    }
}

int main(int argc, char *argv[])
{

    VideoCapture capture("testVideo_20fpsTo100_10fpsTo200_30fpsTo300_20fpsTo400.mp4");

    saveFrame(capture,1);
    saveFrame(capture,150);
    saveFrame(capture,250);
    saveFrame(capture,350);

    VideoCapture cap2("testVideo_20fpsTo100_10fpsTo200_30fpsTo300_20fpsTo400.mp4");
    dumpFrames(cap2);
}

The sequential dump work well : each frame correspond to their content. The seeking frame do not match.

Even ffmpeg don't like my video : ffmpeg -i testVideo_20fpsTo100_10fpsTo200_30fpsTo300_20fpsTo400.mp4 dump/frame%03d.png With my ffmpeg 3.4.4, from frame 100, thing start to get "out of sync" (frame number don't correspond to their content).

I would be surprise if FFMS2 can seek properly. When I have enough motivation and time, I may test FFMS2 and show it here.

Edit: I found a way to extract frame number using ffmpeg cli : ffmpeg -i testVideo_20fpsTo100_10fpsTo200_30fpsTo300_20fpsTo400.mp4 -vf "select=eq(n\,190)" -vframes 1 out.png Edit2: apparently video filter from ffmpeg is going though all frames (reading from here), so not really seeking ...

nji9nji9 commented 4 years ago

Thank you.

Actually I'm not too surprised that frame seeking doesn't work with variable fps, as it doesn't for "standard" constant fps.

FFMS2 however seems to address exactly (and solely) that issue of exact frame seeking (and the project is alive). If it works (?) opencv should use that, as a workaround until ...

krikru commented 4 years ago

I can confirm that I get incorrect frame numbers in opencv 3.4.2 for Python 3.7.3. For some videos (in my case MTS-files), when I do

video_capture.set(cv2.CAP_PROP_POS_FRAMES, target_position),

(where 0 <= target_position < number_of_video_frames) the application will sometimes freeze for several seconds. Later, when the function returns, if I check where in the video I have ended up, using

current_pos = video_capture.get(cv2.CAP_PROP_POS_FRAMES),

I am usually thousands of frames short of where I should have been, i.e. the value of current_pos is several thousand less than the value of target_position, but never greater than target_position.

I'm not sure whether I receive the correct images as I don't have any good way to verify that; all I can say for sure is that there is a discrepancy between what I set and what I get. Also, if the function doesn't freeze, I tend to end up on the correct frame.

I have also noticed that if I play the video (i.e. only change the frame number by 1 at a time) forwards, using video_capture.set(cv2.CAP_PROP_POS_FRAMES, target_position) by increasing target_position with one between each call, I seem to have no problems, while if I play the video backwards by instead subtracting one, I get the correct frame most of the times, but occasionally OpenCV will hiccup and return a frame that is visibly several frames ahead (i.e. has a greater frame number) of what it should have been (but now without the freezing).

JunweiLiang commented 4 years ago

Stumbling into this bug in 2020... But at least I know why: https://stackoverflow.com/questions/44488636/opencv-reading-frames-from-videocapture-advances-the-video-to-bizarrely-wrong-l/44551037

chobao commented 3 years ago

I may come across the bug too.But I failed to get specific frame in .mov video. with opencv 3.4.2 in python 2.7.3.Is there someone encountering the situation on .mov video(what i saw above is all about .ffmpeg video). The pipeline of my system is to detect key frames by some algorithm on video with cap.read() from first frame to end.And then i will save the key frame id which is counted by myself.Finally ,i use cap.set(CAP_PROP_POS_MSEC/CAP_PROP_POS_FRAMES) to fetch the specific frame in video by key frame id aformentationed. Here is the code:

def detectkeyframe():
    cap = cv2.VideoCapture(video_path)
    frame_id = 0
    key_frame_ids=[]
    while True:
        ret,frame = cap.read()
        if not ret:
            break

        if it is a key frame:
            key_frame_ids.append(frame_id)
            cv2.imwrite(frame)  #saved for comparing
        frame_id +=1

def dumpkeyframe(key_frame_ids):
    cap = cv2.VideoCapture(video_path)
    for frame_id in key_frame_ids:
        cap.set(CAP_PROP_POS_FRAMES,frame_id)
        #cap.set(CAP_PROP_POS_MSEC,timeofframe_id)  #it is also useless
        ret,frame = cap.read()    #frame obtained is always ahead of the real frame i want
        cv2.imwrite(frame)

the frame saved by two function is not the same,the frame from dumpkeyframe() is always ahead of the frame from detectkeyframe() Is there someone telling me if is caused by bug above in opencv or something else?

BrettRyland commented 3 years ago

@1612190130 The problem seems to be inherent in the way videos are encoded with variable bit rate and how opencv (and almost all other decoders) estimates where a frame is in the file. I'm not sure about the details of .mov files, but it's probably the same. I recommend using ffms2 instead of cv2.VideoCapture to read the video if you need to perform any seeking (if you don't seek, it's not a problem). It has a bit of overhead as it needs to index the video first, but that only occurs on opening the video and can be cached if you switch between videos.

nji9nji9 commented 3 years ago

Another idea to circumvent the problem of unreliable capturing of a special frame:

may be it is possible to do the extraction “on the fly” (= in the first pass of analysing frame by frame). If that’s possible in your alg design. Maybe with some cashing of previous frames.

Using ffms2 will do for sure, but

I don’t believe this problem ever will be solved in ffmpeg. Strangely with DirectShow it worked always correct.

IradNuriel commented 3 years ago

This problem has nothing to do with ffmpeg. I'm using AVI video with jpeg codec, and it is not that the .set(CAP_PROP_POS_FRAMES, frame_id) giving the wrong frame, it is that it is totally not working! giving the first frame(even when asked to give the 1000th frame) and worse than that it returns true! so how should I notice by myself that this function not working?!?!?!?!?!?!?!?!? opencv creators should be ashamed that this issue is still open in 09/2020 more than 3 years after it was opened(and that it is still in low priority). The problem was discovered in opencv 3.2, and to this day, it is still occur in opencv 4.4, 1.2 versions after.

Britefury commented 3 years ago

Following the code, the FFMPEG VideoCapture implementation calls its seek method:

https://github.com/opencv/opencv/blob/9c8626bf3cc74ec42d7d0583c484eef444a338a0/modules/videoio/src/cap_ffmpeg_impl.hpp#L1519-L1547

There are two implementations of seek, one that accepts time in seconds that converts its parameter to frames and passes it to the frame-based implementation, here:

https://github.com/opencv/opencv/blob/9c8626bf3cc74ec42d7d0583c484eef444a338a0/modules/videoio/src/cap_ffmpeg_impl.hpp#L1456-L1512

It may mean going through this code with a fine-toothed comb and checking the logic. There are one or two things there I don't get the logic of to be honest.

In part, the work is delegated to av_seek_frame that is provided by the FFMPEG library. Apparently it is tricky to use:

https://stackoverflow.com/questions/39983025/how-to-read-any-frame-while-having-frame-number-using-ffmpeg-av-seek-frame

Can definitely see what they mean by its documentation not being particularly helpful:

https://ffmpeg.org/doxygen/4.1/group__lavf__decoding.html#gaa23f7619d8d4ea0857065d9979c75ac8

As for how to go about fixing this, the Python pims library at https://github.com/soft-matter/pims may be of help. They offer equivalent seeking functionality in their PyAvTimedReader:

https://github.com/soft-matter/pims/blob/master/pims/pyav_reader.py#L170-L254

and in their PyAvIndexedReader:

https://github.com/soft-matter/pims/blob/3c8f7832113517a37ddd65a2803a6b17f9c33e4c/pims/pyav_reader.py#L277-L407

I have compared both PyAvTimedReader and PyAvIndexedReader to the results of converting a video file to a stack of PNG files and found that PyAvIndexedReader has less off-by-one-frame errors than PyAvTimedReader (sadly neither are perfect). Unfortunately we don't have the luxury of scanning the video file and building up a complete index at the start, like PyAvIndexedReader. But perhaps the PyAvTimedReader logic can be improved on in order to get things right. I'd be interested in seeing how packages like GStreamer handle these problems, as they also offer seeking functionality. How right to they get them?

roninpawn commented 3 years ago

Coming here from https://github.com/opencv/opencv/issues/18844 where I discovered that using multiprocessing to parse a VideoCapture object in 7 seconds instead of 24 seconds also features the quirk of different workers disagreeing on whether this is frame number 2546 or THAT is frame number 2546.

Did some deep digging (on the top-level FFmpeg support documents) and found this excerpt on https://trac.ffmpeg.org/wiki/Seeking :

"As of FFmpeg 2.1, when transcoding with ffmpeg (i.e. not just stream copying), -ss is now also "frame-accurate" even when used as an input option. Previous behavior (seeking only to the nearest preceding keyframe, even if not precisely accurate) can be restored with the -noaccurate_seek option."

And this one from https://ffmpeg.org/ffmpeg.html#Main-options

"-ss position (input/output) When used as an input option (before -i), seeks in this input file to position. Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved."

So, FFmpeg seeks the keyframes, yes... THEN it finds the /actually requested time/. I don't see that there's any excuse for OpenCV - the first and most prevalent search result for "decode video with [insert programming language]" to not simply implement what FFmpeg states loudly across all of it's documentation that it supports.

Regardless, it looks like I'll be using some kind of ffmpeg wrapper for Python to get this done? ...I mean there's got to be an ACCURATE way to multiprocess video. Right? RIGHT?!!?

roninpawn commented 3 years ago

Been a minute since I found my way here. Just wanted to follow-up with the solution I went with.

- Brief Re-cap -

You can't have confidence in what frame of video will be returned using any form of OpenCV's VideoCapture methods that re-align the 'playhead.' If you ask it to move to frame 2000, you might get frame 2001; If you ask it to move to the frame 10 minutes, 17 seconds, and 133ms into the video, you might get the frame at 10:17.150 instead.

The only way to know you have the correct frame and/or time with OpenCV's VideoCapture object is to step through every single frame of the video from the start. Any method that relocates the playhead non-linearly cannot be trusted. (especially in the region just passed the mid-point of an mpeg video)

- The Solution for my Project (and yours!) -

Don't use OpenCV for video. When I stumbled into this issue, my goal was to increase the speed of my OpenCV-based video analysis tool by multiprocessing. This meant each worker created it's own VideoObject() to work with and began its analysis where one of the other workers would end. This required moving each worker's playhead to their starting point in the video. And as we've all found out, the VideoObject can't be trusted to do that.

So after I got everything working and then discovered this MASSIVE, UNDECLARED OVERSIGHT, by LUCKILY noticing that just SOME of a random pool of results were all of 16 milliseconds off the expected values... I had the pleasure of joining this thread. Where 3 YEARS ago, everyone and their cousin stepped forward to say that AT THE LEAST a WARNING notification should be thrown in-console, if any playhead positioning method was called...

(you know, so that I and other developers building on this trusted library wouldn't spend days developing ourselves into corners from which there are no exits)

So, use FFmpeg Instead It all comes down to this: If the only way to use OpenCV's VideoObject() with confidence is to read the entire video from front to back, then you should just do that with FFmpeg instead. A single call to an FFmpeg stream, with no multiprocessing methods added, is (no-exaggeration) twice as fast as having 12 separate workers, on 12 separate CPU cores, each process 1/12th of an OpenCV VideoObject accessed from SSD. And, of course, FFmpeg has that slight added benefit of actually returning the frame you asked it for.

Which means there's basically no situation in which you should ever use OpenCV's VideoObject(), other than (maybe) as a convenient way of quickly hacking in a slow, cpu hogging, memory consuming, technically untrustworthy and inaccurate, means of spawning some video playback in a UI.

If you need to be able to move the playhead around accurately, you HAVE TO use an FFmpeg wrapper instead of OpenCV, because OpenCV cannot do it accurately. And if your application CAN simply read the video source from front to back, making OpenCV a viable option... You should still use FFmpeg to do that. Because any OpenCV VideoObject() implementation is simply a choice to make your video accessing multitudes slower, while consuming an unnecessary amount of CPU and memory resources.

- The Conclusive Recommendation -

To the users: If you have been accessing video media with OpenCV's VideoObject() for any purpose where either speed, accuracy, or memory overhead actually matters... Deprecate it. Download the FFmpeg suite of applications, (ffmpeg, ffplay, ffprobe) and seek a convenient FFmpeg-wrapper library for whatever language you're coding in that will compose the often complicated and verbose command-line calls to FFmpeg for you, based on what you tell it you want in the code. Then work with the frames of video delivered to your application through the fast, efficient, and frame-accurate FFmpeg.

To the OpenCV Maintainers:

if set.method == CAP_PROP_POS_FRAMES or set.method == CAP_PROP_POS_MSECS:
   warn("Hey. Sorry we didn't mention it for a third of a decade... These methods don't actually work. /r/n" \
            "See #9053 for reasonable people advocating a minimum standard, and being summarily ignored.")
nji9nji9 commented 3 years ago

Bravo!

Thank you very much for sharing your work's result.

The speed is one more reason then.

However the most important thing would be to let drop the compiler the warning, so that at least the new compiled code will tell from now on. (The hundreds of older apps will never know).

Am 16.12.2020 um 00:40 schrieb roninpawn:

Been a minute since I found my way here. Just wanted to follow-up with the /solution/ I went with.

  - Brief Re-cap -

You can't have confidence in what frame of video will be returned using any form of OpenCV's VideoCapture methods that re-align the 'playhead.' If you ask it to move to frame 2000, you might get frame 2001; If you ask it to move to the frame 10 minutes, 17 seconds, and 133ms into the video, you might get the frame at 10:17.150 instead.

The only way to know you have the correct frame and/or time with OpenCV's VideoCapture object is to step through every single frame of the video from the start. Any method that relocates the playhead non-linearly cannot be trusted. (especially in the region just passed the mid-point of an mpeg video)

  - The Solution for my Project (and yours!) -

Don't use OpenCV for video. When I stumbled into this issue, my goal was to increase the speed of my OpenCV-based video analysis tool by multiprocessing. This meant each worker created it's own VideoObject() to work with and began its analysis where one of the other workers would end. This required moving each worker's playhead to their starting point in the video. And as we've all found out, the VideoObject can't be trusted to do that.

So after I got everything working and then discovered this MASSIVE, UNDECLARED OVERSIGHT, by LUCKILY noticing that just SOME of a random pool of results were all of 16 milliseconds off the expected values... I had the pleasure of joining this thread. Where 3 YEARS ago, everyone and their cousin stepped forward to say that AT THE LEAST a WARNING notification should be thrown in-console, if any playhead positioning method was called...

/(you know, so that I and other developers building on this trusted library wouldn't spend days developing ourselves into corners from which there are no exits)/

So, use FFmpeg Instead It all comes down to this: If the only way to use OpenCV's VideoObject() with confidence is to read the entire video from front to back, then you should just do that with FFmpeg instead. A single call to an FFmpeg stream, with no multiprocessing methods added, is (no-exaggeration) twice as fast as having 12 separate workers, on 12 separate CPU cores, each process 1/12th of an OpenCV VideoObject accessed from SSD. And, of course, FFmpeg has that slight added benefit of actually returning the frame you asked it for.

Which means there's basically no situation in which you should ever use OpenCV's VideoObject(), other than (maybe) as a convenient way of quickly hacking in a slow, cpu hogging, memory consuming, technically untrustworthy and inaccurate, means of spawning some video playback in a UI.

If you need to be able to move the playhead around accurately, you HAVE TO use an FFmpeg wrapper instead of OpenCV, because OpenCV cannot do it accurately. And if your application CAN simply read the video source from front to back, making OpenCV a viable option... You /should still/ use FFmpeg to do that. Because any OpenCV VideoObject() implementation is simply a choice to make your video accessing multitudes slower, while consuming an unnecessary amount of CPU and memory resources.

  - The Conclusive Recommendation -

To the users: If you have been accessing video media with OpenCV's VideoObject() for any purpose where either speed, accuracy, or memory overhead actually matters... Deprecate it. Download the FFmpeg suite of applications, (ffmpeg, ffplay, ffprobe) and seek a convenient FFmpeg-wrapper library for whatever language you're coding in that will compose the often complicated and verbose command-line calls to FFmpeg for you, based on what you tell it you want in the code. Then work with the frames of video delivered to your application through the fast, efficient, and frame-accurate FFmpeg.

To the OpenCV Maintainers:

|if set.method == CAP_PROP_POS_FRAMES or set.method == CAP_PROP_POS_MSECS: warn("Hey. Sorry we didn't mention it for a third of a decade... These methods don't actually work. /r/n" \ "See #9053 for reasonable people advocating a minimum standard, and being summarily ignored.") |

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/opencv/opencv/issues/9053#issuecomment-745635554, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHDK6JMXACW2LZ634TZYNQLSU7XWZANCNFSM4DRKF7TQ.

bml1g12 commented 3 years ago

ssue, my goal was to increase the speed of my OpenCV-based video analysis tool by multiprocessing. This meant each worker created it's own VideoObject() to work with and began its analysis where one of the other workers would end. This required moving each worker's playhead to their starting point in the video. And as we've all found out, the VideoObject can't be trusted to do that.

So after I got everything working and then disc

@roninpawn thanks for your suggestion here; but do you know any good libaries for efficiently converting the ffmpeg stream into a python readable (i.e. generally numpy array) format?

roninpawn commented 3 years ago

@roninpawn thanks for your suggestion here; but do you know any good libaries for efficiently converting the ffmpeg stream into a python readable (i.e. generally numpy array) format?

The specific ffmpeg wrapper I chose at random is 'ffmpeg-python' version 0.2.0 by Karl Kroening. As for getting the video out of ffmpeg and into handy numpy format I wrote my own little 'VideoStream' class for it in my current project to keep it clean and simple.

The VideoStream class I wrote here: https://github.com/roninpawn/splitRP/blob/Video-Moderation-Tool/videostream.py

It's all self-contained; ready-to-go. Stick it in your project directory and from videostream import *. It's just 4 uncomplicated public methods that are all pretty self explanatory. I even left some sample implementation code at the bottom, expecting to share it around someday. (Uncomment the 'cv2' and 'time' imports at the top if you want to run the sample code.)

Hope it saves you some hassle!

bml1g12 commented 3 years ago

@roninpawn thanks for your suggestion here; but do you know any good libaries for efficiently converting the ffmpeg stream into a python readable (i.e. generally numpy array) format?

The specific ffmpeg wrapper I chose at random is 'ffmpeg-python' version 0.2.0 by Karl Kroening. As for getting the video out of ffmpeg and into handy numpy format I wrote my own little 'VideoStream' class for it in my current project to keep it clean and simple.

The VideoStream class I wrote here: https://github.com/roninpawn/splitRP/blob/Video-Moderation-Tool/videostream.py

It's all self-contained; ready-to-go. Stick it in your project directory and from videostream import *. It's just 4 uncomplicated public methods that are all pretty self explanatory. I even left some sample implementation code at the bottom, expecting to share it around someday. (Uncomment the 'cv2' and 'time' imports at the top if you want to run the sample code.)

Hope it saves you some hassle!

I see, I think this does not work for my video as it has no nb_frames metadata:

>>> ffmpeg.probe("video_720x480.mkv")["streams"]
[{'index': 0, 'codec_name': 'h264', 'codec_long_name': 'H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', 'profile': 'High', 'codec_type': 'video', 'codec_time_base': '1001/60000', 'codec_tag_string': '[0][0][0][0]', 'codec_tag': '0x0000', 'width': 720, 'height': 480, 'coded_width': 720, 'coded_height': 480, 'has_b_frames': 2, 'pix_fmt': 'yuv420p', 'level': 30, 'chroma_location': 'left', 'field_order': 'progressive', 'refs': 1, 'is_avc': 'true', 'nal_length_size': '4', 'r_frame_rate': '30000/1001', 'avg_frame_rate': '30000/1001', 'time_base': '1/1000', 'start_pts': 0, 'start_time': '0.000000', 'bits_per_raw_sample': '8', 'disposition': {'default': 1, 'dub': 0, 'original': 0, 'comment': 0, 'lyrics': 0, 'karaoke': 0, 'forced': 0, 'hearing_impaired': 0, 'visual_impaired': 0, 'clean_effects': 0, 'attached_pic': 0, 'timed_thumbnails': 0}, 'tags': {'HANDLER_NAME': 'L-SMASH Video Handler', 'ENCODER': 'Lavc58.112.100 libx264', 'DURATION': '00:01:56.388000000'}}, {'index': 1, 'codec_name': 'ac3', 'codec_long_name': 'ATSC A/52A (AC-3)', 'codec_type': 'audio', 'codec_time_base': '1/48000', 'codec_tag_string': '[0][0][0][0]', 'codec_tag': '0x0000', 'sample_fmt': 'fltp', 'sample_rate': '48000', 'channels': 2, 'channel_layout': 'stereo', 'bits_per_sample': 0, 'dmix_mode': '-1', 'ltrt_cmixlev': '-1.000000', 'ltrt_surmixlev': '-1.000000', 'loro_cmixlev': '-1.000000', 'loro_surmixlev': '-1.000000', 'r_frame_rate': '0/0', 'avg_frame_rate': '0/0', 'time_base': '1/1000', 'start_pts': 0, 'start_time': '0.000000', 'bit_rate': '192000', 'disposition': {'default': 1, 'dub': 0, 'original': 0, 'comment': 0, 'lyrics': 0, 'karaoke': 0, 'forced': 0, 'hearing_impaired': 0, 'visual_impaired': 0, 'clean_effects': 0, 'attached_pic': 0, 'timed_thumbnails': 0}, 'tags': {'HANDLER_NAME': 'L-SMASH Audio Handler', 'ENCODER': 'Lavc58.112.100 ac3', 'DURATION': '00:01:56.384000000'}}]

Still even when I hardcode the total number of frames, the first ret value it returns from

    import cv2

    cap = VideoStream("/media/ben/datadrive/benchmarking_video_reading/assets/video_720x480.mkv")
    cap.open_stream()
    cap.config(0, 1000)
    print(cap.__dict__)
    while True:
        ret, img = cap.read()
        print(ret)
        if not ret:
            break
        cv2.imshow("img", img)
        k = cv2.waitKey(1)
        if ord("q") == k:
            break

Is False

runfile('/media/ben/datadrive/benchmarking_video_reading/video_reading_benchmarks/ffmpeg-python/videostream.py', wdir='/media/ben/datadrive/benchmarking_video_reading/video_reading_benchmarks/ffmpeg-python')

{'path': '/media/ben/datadrive/benchmarking_video_reading/assets/video_720x480.mkv', 'start': 0, 'frame_rate': 29.97002997002997, 'total_frames': 1000, 'end': 1000, 'frame_range': 1000, 'resolution': (720, 480), 'xywh': [0, 0, 720, 480], '_frame_bytes': 1036800, '_frame': [], '_raw': <subprocess.Popen object at 0x7fc470cc8c18>, '_EOF': False, 'cur_frame': 0}

ffmpeg version 4.0 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 7.2.0 (crosstool-NG fa8859cb)
  configuration: --prefix=/home/ben/anaconda3/envs/thrianalysisnew --cc=/opt/conda/conda-bld/ffmpeg_1531088893642/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-shared --enable-static --enable-zlib --enable-pic --enable-gpl --enable-version3 --disable-nonfree --enable-hardcoded-tables --enable-avresample --enable-libfreetype --disable-openssl --disable-gnutls --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --disable-libx264
  libavutil      56. 14.100 / 56. 14.100
  libavcodec     58. 18.100 / 58. 18.100
  libavformat    58. 12.100 / 58. 12.100
  libavdevice    58.  3.100 / 58.  3.100
  libavfilter     7. 16.100 /  7. 16.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  1.100 /  5.  1.100
  libswresample   3.  1.100 /  3.  1.100
  libpostproc    55.  1.100 / 55.  1.100
Input #0, matroska,webm, from '/media/ben/datadrive/benchmarking_video_reading/assets/video_720x480.mkv':
  Metadata:
    COMPATIBLE_BRANDS: mp42mp41isomavc1
    MAJOR_BRAND     : mp42
    MINOR_VERSION   : 0
    ENCODER         : Lavf58.62.100
  Duration: 00:01:56.39, start: 0.000000, bitrate: 578 kb/s
    Stream #0:0: Video: h264 (High), yuv420p(progressive), 720x480, 29.97 fps, 29.97 tbr, 1k tbn, 59.94 tbc (default)
    Metadata:
      HANDLER_NAME    : L-SMASH Video Handler
      ENCODER         : Lavc58.112.100 libx264
      DURATION        : 00:01:56.388000000
    Stream #0:1: Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s (default)
    Metadata:
      HANDLER_NAME    : L-SMASH Audio Handler
      ENCODER         : Lavc58.112.100 ac3
      DURATION        : 00:01:56.384000000
Stream mapping:
  Stream #0:0 (h264) -> crop
  setpts -> Stream #0:0 (rawvideo)
Press [q] to stop, [?] for help
Output #0, rawvideo, to 'pipe:':
  Metadata:
    COMPATIBLE_BRANDS: mp42mp41isomavc1
    MAJOR_BRAND     : mp42
    MINOR_VERSION   : 0
    encoder         : Lavf58.12.100
    Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 720x480, q=2-31, 248583 kb/s, 29.97 fps, 29.97 tbn, 29.97 tbc (default)
    Metadata:
      encoder         : Lavc58.18.100 rawvideo
False

But I found the following works:

import ffmpeg
import numpy as np

class FFMPEGStream():
    def __init__(self, videopath):
        self.fn = videopath
        self.start = 0

        probe = ffmpeg.probe(videopath)
        video_info = next(s for s in probe['streams'] if s['codec_type'] == 'video')
        self.width = int(video_info['width'])
        self.height = int(video_info['height'])

    def get_np_array(self, n_frames_to_read):
        out, _ = (
            ffmpeg
                .input(self.fn)
                .trim(start_frame=self.start, end_frame=n_frames_to_read)
                .output('pipe:', format='rawvideo', pix_fmt='bgr24')
                .run(capture_stdout=True)
        )
        video = (
            np.frombuffer(out, np.uint8)
                .reshape([-1, self.height, self.width, 3])
        )
        return video
roninpawn commented 3 years ago

I think this does not work for my video as it has no nb_frames metadata even when I hardcode the total number of frames, the first ret value it returns Is False

Curious. My app is being used by multiple members of our speedrunning community to automate run review and timing analysis, and I'm guessing we've processed over a hundred videos using this code... That said, the videos are exclusively encoded for, processed by, and retrieved from YouTube.

I looked a bit at your discussions in the CamGear related threads, and I think we've fallen down similar rabbit holes in the video-via-Python world. ;) The one thing I wanted to say, for whatever it's worth, is that the actual rendering of decoded raw video to screen through OpenCV might, itself, end up being a bottleneck, if that's what you're aiming for. My app works on cropped frames in the background and generates a report, so I haven't speed-tested full-frame rendering to screen. But, given everything else above in this thread, I'd be surprised if OpenCV's drawing method turned out to be efficient and optimized.

Again, not 1000% sure what your project is, but at this point I think I've decided not to trust OpenCV to handle video, as a general rule. No offense to the maintainers here, but if I recall, one of the contributors to this thread stated explicitly that OpenCV's video support was ostensibly a tacked-on afterthought, implying that anyone who has relied on it in task-critical implementations should have done more testing.

So if your end goal is to render the frames to screen -- don't forget to benchmark the draw method. Ingest might just be ONE of the bottlenecks and not the biggest.

developer0hye commented 3 years ago

@roninpawn thanks for your suggestion here; but do you know any good libaries for efficiently converting the ffmpeg stream into a python readable (i.e. generally numpy array) format?

The specific ffmpeg wrapper I chose at random is 'ffmpeg-python' version 0.2.0 by Karl Kroening. As for getting the video out of ffmpeg and into handy numpy format I wrote my own little 'VideoStream' class for it in my current project to keep it clean and simple. The VideoStream class I wrote here: https://github.com/roninpawn/splitRP/blob/Video-Moderation-Tool/videostream.py It's all self-contained; ready-to-go. Stick it in your project directory and from videostream import *. It's just 4 uncomplicated public methods that are all pretty self explanatory. I even left some sample implementation code at the bottom, expecting to share it around someday. (Uncomment the 'cv2' and 'time' imports at the top if you want to run the sample code.) Hope it saves you some hassle!

I see, I think this does not work for my video as it has no nb_frames metadata:

>>> ffmpeg.probe("video_720x480.mkv")["streams"]
[{'index': 0, 'codec_name': 'h264', 'codec_long_name': 'H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', 'profile': 'High', 'codec_type': 'video', 'codec_time_base': '1001/60000', 'codec_tag_string': '[0][0][0][0]', 'codec_tag': '0x0000', 'width': 720, 'height': 480, 'coded_width': 720, 'coded_height': 480, 'has_b_frames': 2, 'pix_fmt': 'yuv420p', 'level': 30, 'chroma_location': 'left', 'field_order': 'progressive', 'refs': 1, 'is_avc': 'true', 'nal_length_size': '4', 'r_frame_rate': '30000/1001', 'avg_frame_rate': '30000/1001', 'time_base': '1/1000', 'start_pts': 0, 'start_time': '0.000000', 'bits_per_raw_sample': '8', 'disposition': {'default': 1, 'dub': 0, 'original': 0, 'comment': 0, 'lyrics': 0, 'karaoke': 0, 'forced': 0, 'hearing_impaired': 0, 'visual_impaired': 0, 'clean_effects': 0, 'attached_pic': 0, 'timed_thumbnails': 0}, 'tags': {'HANDLER_NAME': 'L-SMASH Video Handler', 'ENCODER': 'Lavc58.112.100 libx264', 'DURATION': '00:01:56.388000000'}}, {'index': 1, 'codec_name': 'ac3', 'codec_long_name': 'ATSC A/52A (AC-3)', 'codec_type': 'audio', 'codec_time_base': '1/48000', 'codec_tag_string': '[0][0][0][0]', 'codec_tag': '0x0000', 'sample_fmt': 'fltp', 'sample_rate': '48000', 'channels': 2, 'channel_layout': 'stereo', 'bits_per_sample': 0, 'dmix_mode': '-1', 'ltrt_cmixlev': '-1.000000', 'ltrt_surmixlev': '-1.000000', 'loro_cmixlev': '-1.000000', 'loro_surmixlev': '-1.000000', 'r_frame_rate': '0/0', 'avg_frame_rate': '0/0', 'time_base': '1/1000', 'start_pts': 0, 'start_time': '0.000000', 'bit_rate': '192000', 'disposition': {'default': 1, 'dub': 0, 'original': 0, 'comment': 0, 'lyrics': 0, 'karaoke': 0, 'forced': 0, 'hearing_impaired': 0, 'visual_impaired': 0, 'clean_effects': 0, 'attached_pic': 0, 'timed_thumbnails': 0}, 'tags': {'HANDLER_NAME': 'L-SMASH Audio Handler', 'ENCODER': 'Lavc58.112.100 ac3', 'DURATION': '00:01:56.384000000'}}]

Still even when I hardcode the total number of frames, the first ret value it returns from

    import cv2

    cap = VideoStream("/media/ben/datadrive/benchmarking_video_reading/assets/video_720x480.mkv")
    cap.open_stream()
    cap.config(0, 1000)
    print(cap.__dict__)
    while True:
        ret, img = cap.read()
        print(ret)
        if not ret:
            break
        cv2.imshow("img", img)
        k = cv2.waitKey(1)
        if ord("q") == k:
            break

Is False

runfile('/media/ben/datadrive/benchmarking_video_reading/video_reading_benchmarks/ffmpeg-python/videostream.py', wdir='/media/ben/datadrive/benchmarking_video_reading/video_reading_benchmarks/ffmpeg-python')

{'path': '/media/ben/datadrive/benchmarking_video_reading/assets/video_720x480.mkv', 'start': 0, 'frame_rate': 29.97002997002997, 'total_frames': 1000, 'end': 1000, 'frame_range': 1000, 'resolution': (720, 480), 'xywh': [0, 0, 720, 480], '_frame_bytes': 1036800, '_frame': [], '_raw': <subprocess.Popen object at 0x7fc470cc8c18>, '_EOF': False, 'cur_frame': 0}

ffmpeg version 4.0 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 7.2.0 (crosstool-NG fa8859cb)
  configuration: --prefix=/home/ben/anaconda3/envs/thrianalysisnew --cc=/opt/conda/conda-bld/ffmpeg_1531088893642/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-shared --enable-static --enable-zlib --enable-pic --enable-gpl --enable-version3 --disable-nonfree --enable-hardcoded-tables --enable-avresample --enable-libfreetype --disable-openssl --disable-gnutls --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --disable-libx264
  libavutil      56. 14.100 / 56. 14.100
  libavcodec     58. 18.100 / 58. 18.100
  libavformat    58. 12.100 / 58. 12.100
  libavdevice    58.  3.100 / 58.  3.100
  libavfilter     7. 16.100 /  7. 16.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  1.100 /  5.  1.100
  libswresample   3.  1.100 /  3.  1.100
  libpostproc    55.  1.100 / 55.  1.100
Input #0, matroska,webm, from '/media/ben/datadrive/benchmarking_video_reading/assets/video_720x480.mkv':
  Metadata:
    COMPATIBLE_BRANDS: mp42mp41isomavc1
    MAJOR_BRAND     : mp42
    MINOR_VERSION   : 0
    ENCODER         : Lavf58.62.100
  Duration: 00:01:56.39, start: 0.000000, bitrate: 578 kb/s
    Stream #0:0: Video: h264 (High), yuv420p(progressive), 720x480, 29.97 fps, 29.97 tbr, 1k tbn, 59.94 tbc (default)
    Metadata:
      HANDLER_NAME    : L-SMASH Video Handler
      ENCODER         : Lavc58.112.100 libx264
      DURATION        : 00:01:56.388000000
    Stream #0:1: Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s (default)
    Metadata:
      HANDLER_NAME    : L-SMASH Audio Handler
      ENCODER         : Lavc58.112.100 ac3
      DURATION        : 00:01:56.384000000
Stream mapping:
  Stream #0:0 (h264) -> crop
  setpts -> Stream #0:0 (rawvideo)
Press [q] to stop, [?] for help
Output #0, rawvideo, to 'pipe:':
  Metadata:
    COMPATIBLE_BRANDS: mp42mp41isomavc1
    MAJOR_BRAND     : mp42
    MINOR_VERSION   : 0
    encoder         : Lavf58.12.100
    Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 720x480, q=2-31, 248583 kb/s, 29.97 fps, 29.97 tbn, 29.97 tbc (default)
    Metadata:
      encoder         : Lavc58.18.100 rawvideo
False

But I found the following works:

import ffmpeg
import numpy as np

class FFMPEGStream():
    def __init__(self, videopath):
        self.fn = videopath
        self.start = 0

        probe = ffmpeg.probe(videopath)
        video_info = next(s for s in probe['streams'] if s['codec_type'] == 'video')
        self.width = int(video_info['width'])
        self.height = int(video_info['height'])

    def get_np_array(self, n_frames_to_read):
        out, _ = (
            ffmpeg
                .input(self.fn)
                .trim(start_frame=self.start, end_frame=n_frames_to_read)
                .output('pipe:', format='rawvideo', pix_fmt='bgr24')
                .run(capture_stdout=True)
        )
        video = (
            np.frombuffer(out, np.uint8)
                .reshape([-1, self.height, self.width, 3])
        )
        return video

re-encode your video file with ffmpeg

When I re-encode my video file(".mkv") to ".mp4" file with ffmpeg, "nb_frames" information is generated.

bml1g12 commented 3 years ago

I think this does not work for my video as it has no nb_frames metadata even when I hardcode the total number of frames, the first ret value it returns Is False

Curious. My app is being used by multiple members of our speedrunning community to automate run review and timing analysis, and I'm guessing we've processed over a hundred videos using this code... That said, the videos are exclusively encoded for, processed by, and retrieved from YouTube.

I looked a bit at your discussions in the CamGear related threads, and I think we've fallen down similar rabbit holes in the video-via-Python world. ;) The one thing I wanted to say, for whatever it's worth, is that the actual rendering of decoded raw video to screen through OpenCV might, itself, end up being a bottleneck, if that's what you're aiming for. My app works on cropped frames in the background and generates a report, so I haven't speed-tested full-frame rendering to screen. But, given everything else above in this thread, I'd be surprised if OpenCV's drawing method turned out to be efficient and optimized.

Again, not 1000% sure what your project is, but at this point I think I've decided not to trust OpenCV to handle video, as a general rule. No offense to the maintainers here, but if I recall, one of the contributors to this thread stated explicitly that OpenCV's video support was ostensibly a tacked-on afterthought, implying that anyone who has relied on it in task-critical implementations should have done more testing.

So if your end goal is to render the frames to screen -- don't forget to benchmark the draw method. Ingest might just be ONE of the bottlenecks and not the biggest.

Indeed I expect drawing would be a bottleneck. I am trying to benchmark a range of cases (IO limited, CPU limited etc.), to collate a resource on the optimal strategy for integrating video reading with numpy video analysis. Although still a work in progress, my benchmarks so far suggest the best tool for the job depends heavily on whether the application is CPU limited or IO limited, and the size of the video itself.

https://github.com/bml1g12/benchmarking_video_reading_python