sgoldenlab / simba

SimBA (Simple Behavioral Analysis), a pipeline and GUI for developing supervised behavioral classifiers
https://simba-uw-tf-dev.readthedocs.io/
GNU General Public License v3.0
273 stars 137 forks source link

Inaccurate FPS extracted from video in Batch Process videos #282

Open florianduclot opened 10 months ago

florianduclot commented 10 months ago

Describe the bug After batch processing many videos for clipping, I realized the processed videos were shorter than expected by a few seconds. I then realized this is due to an inaccurate extraction of the video file's FPS for videos with a float FPS. For instance, these videos have an FPS of 29.97, but Simba detects them as 29 FPS.

To the best of my knowledge, this is due to... https://github.com/sgoldenlab/simba/blob/97542ecc3e7f21f6698fe0ad8cc4f9167c53dc1a/simba/utils/read_write.py#L405C43-L405C43

    video_data["fps"] = int(cap.get(cv2.CAP_PROP_FPS))

...where the FPS value returned by cv2 is transformed to int.

I suppose there may be a good reason why it is transformed to an integer, so would it actually be reasonable to keep the float as returned by cv2? I know the video FPS is used throughout Simba so I would understand if that causes a lot of issues everywhere else.

Just wondering if not using the int transformation would be reasonable.

To Reproduce Steps to reproduce the behavior:

>>> import cv2
>>> cap = cv2.VideoCapture(video_path)
>>> cap
<VideoCapture 00000215858DA470>
>>> cap.get(cv2.CAP_PROP_FPS)
29.97002997002997
>>> int(cap.get(cv2.CAP_PROP_FPS))
29

Expected behavior The exact FPS could be used instead of its int

Desktop (please complete the following information):

sronilsson commented 10 months ago

Hi @florianduclot !

Yes this is something that has been bugging me a little, and I am not entirely sure how to approach.. the int in get_video_meta_data is there as a fix (a long time ago) as I had issues passing float fps values to OpenCV VideoWriter - it seems to wanted an int, I don’t know if that is the case anymore.

For the batch pre-processing, removing the int type in get_video_meta_data should work, OpenCV is never called in the batch-pre-processing and FFMpeg is called directly. If this is changed, also THIS should be changed to DoubleVar(), or (StringVar in tkinter), and there is an integer unit check for fps HERE that has to be changed to a float check rather than integer check.

The trouble is if changing get_video_meta_data will cause other issues elsewhereif ints are indeed needed where opencv is called, I don’t have enough test cases to comfortably make this change. One potential solution: We could do this below, and pass fps_as_int = True in batch preprocess? It won’t affect anything where opencv is called, and we can change it later in rest of the code if we find that float fps’s work? Let me know what you think!


def get_video_meta_data(video_path: Union[str, os.PathLike]
                          fps_as_int: bool = True) -> dict:

"""
    Read video metadata (fps, resolution, frame cnt etc.) from video file (e.g., mp4).

    :parameter str video_path: Path to a video file.
    :parameter bool fps_as_int: If True, force video fps to int through floor rounding, else float. Default = True.
    :return dict: Video file meta data.

    :example:
    >>> get_video_meta_data('test_data/video_tests/Video_1.avi')
    {'video_name': 'Video_1', 'fps': 30, 'width': 400, 'height': 600, 'frame_count': 300, 'resolution_str': '400 x 600', 'video_length_s': 10}

    """

    video_data = {}
    cap = cv2.VideoCapture(video_path)
    _, video_data['video_name'], _ = get_fn_ext(video_path)
    video_data['fps'] = cap.get(cv2.CAP_PROP_FPS)
    if fps_as_int:
    video_data['fps'] = int(video_data['fps’])
    …..
    ....
florianduclot commented 10 months ago

Thanks for the additional information, @sronilsson , that is very helpful!

One potential solution: We could do this below, and pass fps_as_int = True in batch preprocess? It won’t affect anything where opencv is called, and we can change it later in rest of the code if we find that float fps’s work? Let me know what you think!

Sounds like a good idea, at least as a temporary solution. If you haven't done so already, I'll try to implement that using the pointers you already listed and return back with a PR if that helps.

Personally, I'll use that to convert my FPS from 29.97 -> 29 so that the rest of the Simba pipeline won't be potentially using erroneous timestamps as well. Am I correct in thinking that this could lead to inaccurate mapping of START/STOP behavioral bouts events onto frames (at least when using behavioral annotations imported from 3rd party applications)?

florianduclot commented 10 months ago

@sronilsson: scratch that... it has nothing to do with the detection of FPS. My worry about the detection of FPS and the Simba pipeline as a whole still applies, though, so I would still be interested in your feedback on that point.

With regards to the batch processing of videos I couldn't reproduce my bug on my laptop, where I do not have a GPU... The detection of fps by OpenCV is moot there (besides the display of FPS in the dialog window but that's minor, I think) as everything is handled by ffmpeg which correctly detects FPS as float. The problem I was having relates to the different commands generated depending on the GPU flag:

Based on ffmpeg docs, I would say this should ALWAYS be -t as Simba computes and uses a time difference there and not a position. Or we can simply not calculate the time difference if it's not needed and just go with -to and use end_time straight from the yser input. You probably know which one's the best of the two options here, I suppose.

sronilsson commented 10 months ago

@florianduclot You are right, for the rest of the pipeline the values in the FPS column of the project_folder/csv/video_info.csv is used for non-plotting functions, which is specified in the video parameters pop-up in GUI (simba.ui.video_info_ui). This pop-up is also asking for an integer IntVar FPS, so throughout, it is best to use integer FPS. It should be made clear in the docs.

And thanks for looking at the FFMPEG commands! I will change it. One question, I don't have have a GPU available so can't test. I can see that -crf 17 is often in the CPU commands, but absent in the GPU commands. I think this comes from way back where a user raised an issue over the file size coming out of the batch-preprocess, so I decreased quality from the default -crf 23. But then I inserted the GPU flag and failed to do the same. If you run with the GPU codecs, have you ever bumped into any very large output video file sizes relative to the input video file size?

florianduclot commented 10 months ago

Thanks for your input on the use of the FPS data throughout the pipeline; I'll definitely convert my videos to the nearest int fps.

I can see that -crf 17 is often in the CPU commands, but absent in the GPU commands. I think this comes from way back where a user raised an issue over the file size coming out of the batch-preprocess, so I decreased quality from the default -crf 23. But then I inserted the GPU flag and failed to do the same. If you run with the GPU codecs, have you ever bumped into any very large output video file sizes relative to the input video file size?

I was actually just about to look into that so I'll get back to you once I have a better idea. Would you be opposed to having some sort of choice left to the user here? Leaving a default in place, of course, but it might be helpful for those preferring to specify the quality they desire.

sronilsson commented 10 months ago

No I wouldn't be opposed to it without confusing users too much. Maybe something like a single "VIDEO QUALITY" column with dropdowns running from 10% to 100% in representing crf's between 23 and 13 defaulting to 100?

I haven't done any testing, but presumably the crf (if below 23) should only be applied to the very last operation, as otherwise the video quality decrease would cumulate across multiple operations (fps, downsample etc..)?

florianduclot commented 10 months ago

Here's a comparison for the same video processed with or without the GPU, using the current ffmpeg command (but including -t for the GPU command instead of -to):

INPUT Length: 00:10:11 Width x Height: 1804 x 510 Total Bitrate: 2,283 kbps FPS: 29.97 SIZE: 175,207,037 bytes

OUTPUT with GPU Length: 00:05:00 Width x Height: 1804 x 510 Total Bitrate: 2,164 kbps FPS: 29.97 SIZE: 81,387,078 bytes command: ffmpeg -hwaccel auto -i "in_path" -ss 00:02:00 -t 0:05:00 -c:v h264_nvenc -async 1 "out_path"

OUTPUT with CPU Length: 00:05:00 Width x Height: 1804 x 510 Total Bitrate: 3,935 kbps FPS: 29.97 SIZE: 147,912,482 bytes command: ffmpeg -i "in_path" -ss 00:02:00 -t 0:05:00 -async 1 -qscale 0 -crf 17 -c:a copy "out_path"

This makes sense given that -crf 17 should be a higher quality than the default CRF of 23.

I'll test with a CRF of 17 for both. <- I didn't realize that option was not available for the h264_nvenv encoder.

sronilsson commented 10 months ago

Ah I just saw that myself :) the GPU codecs seems to take -preset argument which can be fast (low quality I presume), medium or slow. I think for most control, the VIDEO QUALITY dropdown options have to change from the 10-100% options to low-medium-high options depending on if the GPU checkbox is checked or not lol.

the alternative is to always have three options (low , medium, high) in the VIDEO QUALITY dropdowns and map those to three different CRFs if the GPU checkbox is not checked. But that doesn't give users as much control to take advantage of more quality levels if they run on CPU.

sronilsson commented 10 months ago

Would be good to get your feedback on this when you have time! If you upgrade through pip, the batch-preprocess has quality dropdown all the way to the right (and a quickset at the top), that show 100-10% if running on CPU, or low-medium-high if running on GPU.

The 100% to 10% if running on CPU maps to THESE CRF's, and the low, medium, and high maps to THESE GPU presets. Let me know what you think of the CRF map, not sure if theres a better way or bigger/smaller strides.

When clicking EXECUTE there are two more things stores in the json: the mapped quality, and the last operation. The quality defaults to medium or 23. Then, when it hits the last operation runs, the quality changes to whatever the quality is in the json.