healthonrails / annolid

An annotation and instance segmentation-based multiple animal tracking and behavior analysis package.
Other
41 stars 9 forks source link

--extract_frames argument miscounts #2

Closed shamavir closed 4 years ago

shamavir commented 4 years ago

When executing annolid/main.py -v 190116.wmv --extract_frames=20 --algo=uniform, 40 frames are extracted as JPEGs, rather than the expected 20 frames. This is on Windows 10 / Anaconda.

healthonrails commented 4 years ago

It seems that the total number of frames obtained by OpenCV int(cap.get(7)) is 26955 but the actual frames are 52395 for the video. These numbers are not reliable for different videos and formats. So the extracted number of frames may not be the exact number as input for some videos.

shamavir commented 4 years ago

Hm, is this a bug in OpenCV that should be pushed up, or is this because video files created by different people using different tools contain wrong metadata about themselves? If the latter, could we write a script that fixes at least this one issue by determining the actual number of frames in the video and updating the metadata accordingly?

shamavir commented 4 years ago

This problem persists in the current build. In the attached example, specifying 10 frames to be extracted seems to extract 5 frames (instead of 10) based on runtime output, and actually extracts six frames (including frame 0). (The video specified has been uploaded to a new Debugging folder on Cornell Box). Please see anaconda shell screenshot attached.

Annolid error 01, 17 Sept 2020

healthonrails commented 4 years ago

There is no nb_frames metadata in the header for this video novelctrl.mkv. {'index': 0, 'codec_name': 'h264', 'codec_long_name': 'H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', 'profile': 'High', 'codec_type': 'video', 'codec_time_base': '1/60', 'codec_tag_string': '[0][0][0][0]', 'codec_tag': '0x0000', 'width': 1280, 'height': 1024, 'coded_width': 1280, 'coded_height': 1024, 'has_b_frames': 2, 'sample_aspect_ratio': '0:1', 'display_aspect_ratio': '0:1', 'pix_fmt': 'yuv420p', 'level': 32, 'chroma_location': 'left', 'field_order': 'progressive', 'refs': 1, 'is_avc': 'true', 'nal_length_size': '4', 'r_frame_rate': '30/1', 'avg_frame_rate': '30/1', 'time_base': '1/1000', 'start_pts': 0, 'start_time': '0.000000', 'bits_per_raw_sample': '8', 'disposition': {'default': 1, 'dub': 0, 'original': 0, 'comment': 0, 'lyrics': 0, 'karaoke': 0, 'forced': 0, 'hearing_impaired': 0, 'visual_impaired': 0, 'clean_effects': 0, 'attached_pic': 0, 'timed_thumbnails': 0}, 'tags': {'ENCODER': 'Lavc58.54.100 libx264', 'DURATION': '00:05:26.133000000'}}

So OpenCV used the duration in seconds FPS to calculated the number of frames. The following command will return the video's time duration in seconds. _ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 novelctrl.mkv_ 326.133000 30 (FPS) = 9783.99
Which is the same as the method

import cv2
cap = cv2.VideoCapture(video_path)
n_frames = int(cap.get(7))
n_frames = 9784

However, FFMPEG returns the 6834 frames with the following command. Thats why only 6 frames were saved. Should we require users to install ffmpeg to double count the frames?

ffmpeg -discard nokey -i novelctrl.mkv -map 0:v:0 -c copy -f null -

ffmpeg version 3.4 Copyright (c) 2000-2017 the FFmpeg developers built with Apple LLVM version 7.0.2 (clang-700.1.81) configuration: --prefix=/usr/local/Cellar/ffmpeg/3.4 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libmp3lame --enable-libx264 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma libavutil 55. 78.100 / 55. 78.100 libavcodec 57.107.100 / 57.107.100 libavformat 57. 83.100 / 57. 83.100 libavdevice 57. 10.100 / 57. 10.100 libavfilter 6.107.100 / 6.107.100 libavresample 3. 7. 0 / 3. 7. 0 libswscale 4. 8.100 / 4. 8.100 libswresample 2. 9.100 / 2. 9.100 libpostproc 54. 7.100 / 54. 7.100 Input #0, matroska,webm, from '/Users/chenyang/Downloads/novelctrl.mkv': Metadata: ENCODER : Lavf58.29.100 Duration: 00:05:26.13, start: 0.000000, bitrate: 2989 kb/s Stream #0:0: Video: h264 (High), yuv420p(progressive), 1280x1024, 30 fps, 30 tbr, 1k tbn, 60 tbc (default) Metadata: ENCODER : Lavc58.54.100 libx264 DURATION : 00:05:26.133000000 Output #0, null, to 'pipe:': Metadata: encoder : Lavf57.83.100 Stream #0:0: Video: h264 (High), yuv420p(progressive), 1280x1024, q=2-31, 30 fps, 30 tbr, 1k tbn, 1k tbc (default) Metadata: ENCODER : Lavc58.54.100 libx264 DURATION : 00:05:26.133000000 Stream mapping: Stream #0:0 -> #0:0 (copy) Press [q] to stop, [?] for help frame= 6834 fps=0.0 q=-1.0 Lsize=N/A time=00:05:26.00 bitrate=N/A speed=2.37e+03x
video:118961kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

healthonrails commented 4 years ago

Solved by reservoir sampling. Reference: https://en.wikipedia.org/wiki/Reservoir_sampling

shamavir commented 4 years ago

OK, great. I'll give it a try. We'll definitely need to deal with this situation, given that users will have different environments and apparently many video files lack nb_frames metadata.

Why can we not just use the cv2 method, which seems more reliable, to count frames if there is no nb_frames value?

I'm intrigued by your reservoir sampling solution. This won't be really "uniform" sampling through the file, though, will it? More a form of random selection?

I don't think we should require users to install ffmpeg for this reason. Especially if it is giving incorrect information anyway.

healthonrails commented 4 years ago

OpenCV can only read 6834 frames from the video. I have not figured out the exact reasons.
Yes, I will change the "uniform" to random.