zigvu / khajuri

Video Pipeline
0 stars 0 forks source link

Errors when processing longer video of length 02:07:15.34 and 119.88 fps #87

Open arpgh opened 9 years ago

arpgh commented 9 years ago

Two _processvideo runs produced same errors.

Main issues:

  1. run stuck after 3 caffe theads done, 4th one (device 1) doesn't complete
  2. multiple defunct _processvideo processes
  3. progress percentage showing over 100% (going upto 114% before stuck)

Branch: Development Path: gpu2:/mnt/data/wc14cls/training/0seed/vdos/wc14-BraMex/ Video: wc14-BraMex.mkv Key configs: caffe & post process in parallel, compute curation, save compressed json Logs with error: runlog-process_video-curation.txt-ERR1 (or 0 suffix)

Frame density handling at high fps seems ok as seen in runlog. If there are multiple defects, split this into multiple issues.

regmiz commented 9 years ago

have been looking into this - not sure why we have errors though.

arpgh commented 9 years ago

I think it may be VideoReader thread crashing or giving incorrect result. (for eg. frame count which affects the percentage calculation) With frame extraction affected, LMDB creation stops which in turn makes one caffe thread wait forever.

regmiz commented 9 years ago

yeah, VideoReader only goes up to 457701 frame for the .mkv file. I converted the same .mkv into .mp4 using ffmpeg then the frame numbers went up to 915301.

eacharya commented 9 years ago

Interesting bug.

VideoReader is based on FFMPEG so it is surprising that ffmepg is able to convert .mkv to .mp4 but our VideoReader is not able to read it. We should perhaps just use the cpp executables to dump frames from the file to see if the executable crashes/stops. If so, we can probably narrow down the cause to internal buffer creation/updating. If not, it might be related to LMDB creation.

arpgh commented 9 years ago

Same video converted to mp4 also gets stuck later in the run. Now the displayed percentage is getting to 228%. Possibly some VideoReader calculation is off when fps or length is large?

Branch: Development Path: gpu2:/mnt/data/wc14cls/training/0seed/vdos/wc14-BraMex.mp4/ video: wc14-BraMex.mp4 Log with error: runlog-process_video.txt-ERR Key configs: caffe & post process in parallel, don't save patch scores

arpgh commented 9 years ago

Extracting every 24th frame (for density of 5) directly from the video wc14-BraMex.mp4 with ~/khajuri/VideoReader/VideoReader wc14-BraMex.mp4 24 allFrames also gets stuck after dumping about 600 frames. Tried that with another long video with same fps and result is again same. So all indications point to issue in VideoReader.

Path: gpu2:/mnt/data/wc14cls/training/0seed/vdos/wc14-BraMex.mp4/ Log with error: runlog-VideoReader.txt-ERR

regmiz commented 9 years ago

Trying a fix in branch issue87 @ b975ffad97a9f60780bf3502957abef972e14cc9

Problem was lengthInMicroSeconds variable, uint, overflowing. Using int64_t now. Will update once I see the process_video.py complete.

eacharya commented 9 years ago

Should we take this opportunity to replace all non-return-value integers to uint64_t? Given the memory use for VideoReader is already in 100 of MBs, we can spare a few bytes. Plus, we won't see overflow errors for a while. (Don't know how many there are.)

regmiz commented 9 years ago

Fix at issue87 @ b975ffa doesn't seem sufficient. process_video is now stuck at 50% Digging into other issue.

Let me try changing all return values to uint64_t.

regmiz commented 9 years ago

Looks like multiple issues with VideoReader at play here:

  1. A bug which causes us to deadlock. Bounds checking at https://github.com/zigvu/khajuri/blob/master/VideoReader/VideoFrameReader.cpp#L257 is not sufficient since the producer may not have produced all the 'listTailBufNumOfFrames' buffer worth of frames yet. This is the cause of the hang ups. This bounds check can be fixed.
  2. Our calculation of totalFrames based on FPS and videoLength isn't always accurate. This causes us to seek for a frame beyond the last frames of the video and enter an infinite loop here: https://github.com/zigvu/khajuri/blob/master/VideoReader/VideoLevelDb.cpp#L39 This can be prevented by early checks in VideoDbManager.py.
  3. mkv formats seems to provide only half of the mp4 frames. This is unexpected and don't know how to "fix" it. But, by fixing the above two issues. mkv format files can also be evaluated. However, this will have just half of the frames evaluated. So, their density would have to be double.
  4. Performance issue with VideoReader. When we don't find a frame in our existing buffer we seem to sleep for 1 seconds at https://github.com/zigvu/khajuri/blob/master/VideoReader/VideoFrameReader.cpp#L293. We should reduce this to a small number 0.01 seconds should be fine. This will make VideoReader a lot faster. However, not sure if VideoReader is the bottleneck yet. If it was, then the entire pipeline should also be faster with that sleep fixed.
regmiz commented 9 years ago
  1. Lets re-encode the videos to FPS as 30.
regmiz commented 9 years ago

Fixed at eb477fca502a95ae934e8bc70f5e668d953a0fba in issue87 branch. Can we re-try the videos again?

arpgh commented 9 years ago

With the original 2hr video re-encoded at 30fps, the processing now completes ok. Performance is 3.02x. Format is still mkv. Also running localization on video to cross check ..

arpgh commented 9 years ago

Frame extraction from mkv video with 50fps works in some cases but not in others. Here is one that is working - gpu2:/mnt/data/wc14cls/training/0seed/vdos/wc14-ChiAus-1stHalf/wc14-ChiAus-1stHalf.mkv (both in Development and issue87 branches)

regmiz commented 9 years ago

We need to take care of .mkv formats while streaming videos as well.

arpgh commented 9 years ago

Even with re-encoded 2hr video at 25fps, reporting of % progress is going way beyond 100%. Need to make sure it's only log reporting issue and doesn't affect frame handling in VideoReader. Path: gpu2:/mnt/data/wc14itr/vdo-set04/roundOf16Output/wc14-ArgSwi/processvideo.log

eacharya commented 9 years ago

This issue will need to be tackled with larger video streaming infrastructure development. As of right now, all videos are encoded into 25 FPS prior to khajuri/kheer/cellroti processing. While for each of the sub-system is theoretically able to handle videos at different FPS, rather than maintaining/testing different FPS in each sub-system, a cleaner design would be to encode video to 25 FPS during initial ingest from stream.

Hence, for now, removing bug tag and introducing long-term tag.