ubicomplab / rPPG-Toolbox

rPPG-Toolbox: Deep Remote PPG Toolbox (NeurIPS 2023)
https://arxiv.org/abs/2210.00716
Other
504 stars 129 forks source link

Usage of cv2.CAP_PROP_POS_MSEC in UBFCrPPGLoader #247

Closed NeilM21 closed 4 months ago

NeilM21 commented 9 months ago

Hello,

First off thanks for the great repo, I've been using it for quite a while as a peer-reviewed reference for an rPPG task I am currently working on.

I am writing with reference to the data loader script UBFCrPPGLoader.py which currently reads from the source UBFC video and sets the frame/msec pointer using cv2.CAP_PROP_POS_MSEC. From my project where I am working on an in-house dataset, I noticed that cv2.CAP_PROP_POS_FRAMES was not accurate and recalled that those properties are also being used in this project. This would likely also apply to any other data loaders working on videos where frame seeking is needed.

On further investigation: https://github.com/opencv/opencv/issues/9053#issuecomment-633845861 shows that OpenCV's Media I/O can be unreliable, regardless of whether seeking using frames or msec is used. As a workaround, there are either some libraries using FFMPEG mentioned in the same thread, or else a simple workaround is to "seek" the desired frame by having OpenCV read through the full stream using e.g. cv2.VideoCapture.read() up to the desired frame, which is mentioned to be accurate and quite fast, even from my on tests. Here's a code sample from my codebase for your possible perusal:

 vid_obj = cv2.VideoCapture(subject_clip_path)

    if start_idx > 0:
        frame_seek_counter = 0
        success = vid_obj.grab()
        seek_timer_start = time()
        progress_bar = tqdm(total=start_idx, ascii=True,
                            desc=f"Seeking frame {start_idx}")

        while success and frame_seek_counter < start_idx:
            frame_seek_counter += 1
            progress_bar.update(1)
            success = vid_obj.grab()

        progress_bar.close()
        seek_timer_end = time()
        seek_duration = round(seek_timer_end - seek_timer_start, 3)
        print(f"Seek duration: {seek_duration}s")
        success, frame = vid_obj.retrieve()

    else:
        success, frame = vid_obj.read()

start_idx is the frame pointer at which processing is to begin. If this is frame 0, naturally no seeking can take place and we just read the frame data. Otherwise, cv2.VideoCapture.grab() is used in the seeking part as it is quite a lot faster than cv2.VideoCapture.read(), speeding up the seeking process (~16 seconds for 20000 frames vs ~64 seconds). A call to cv2.VideoCapture.retrieve() is made after the last grab to get the frame data as at start_idx before further processing takes place, which is essentially a while loop reading from start_idx to the respective end_idx.

I hope this is helpful, highlighting it since people may re-use the data loaders for medical implementations, and OpenCV themselves seem to have made no effort in highlighting or tackling this bug/issue.

girishvn commented 8 months ago

Hi @NeilM21,

Thanks for bringing this up. Just to confirm: the issue with cv2.CAP_PROP_POS_MSEC and cv2.CAP_PROP_POS_FRAMES is that it may not actually seek the first frame and may set the start pointer to some other frame? Would it be sufficient, in our implementation, to remove the lines where this value is set?

NeilM21 commented 8 months ago

Hello,

You're most welcome, thank you, guys, for the excellent repo.

In terms of this toolbox implementation, since your function always starts from 0, it would most likely work to just remove the cv2.CAP_PROP_POS_MSEC function and as normal, just check the success flag on the frame via the cv2.VideoCapture.read() function. I recommend cross-testing with some samples from UBFC-rPPG.

For anyone working with longer video clips like my scenario (up to 20 mins per video), initially, I thought to do it in iterations like so to avoid flooding my RAM:

 Loop 0: video_cap.set(cv2.CAP_PROP_POS_FRAMES, 0) # Start the video from frame 0 and perform processing
 Loop 1: video_cap.set(cv2.CAP_PROP_POS_FRAMES, 2000) # Start the video from frame 2000 and perform processing
 Loop 2: video_cap.set(cv2.CAP_PROP_POS_FRAMES, 4000) # Start the video from frame 4000 and perform processing

From my experiments, I noticed that the function was not setting the frame to 0, 2000, or 4000 correctly when lined up with the original video, and this was true for every iteration of the loop. In my case, it was close to a 10-frame discrepancy, though it varies depending on codecs, etc. This lines up with the thread linked above and their findings.

Thus, for anyone else who may wish to adapt the code to batch-process like this, I recommend just iterating through the video or using one of the libraries in the same thread above.

Hope this helps!

yahskapar commented 4 months ago

Thanks for this info @NeilM21. If you are willing to make a PR to integrate this change, we would greatly appreciate it! No worries if you don't have time either, I think folks can find this issue in the future and refer to it if needed.

I'll go ahead and close this issue for the time being.