Closed MuhammadWaseem-DevOps closed 4 months ago
Anyone? having any idea about this issue and any possible fix?
Hey!
I just spent a few days on thinking about a battleproof timestamp synchronization solution, and I came up with a great idea.
I built a TrackTimeline
helper class which represents a video or audio track - it can be started & stopped, paused & resumed, and even supports nesting pauses without issues.
video.duration
! 🥳 This was really complex to built as I had to synchronize timestamps between capture sessions, and the entire thing is a producer model - a video buffer can come like a second or so later than the audio buffer, but I need to make sure the video track starts before the audio track starts, and ends after the audio track ends - that's a huge brainf*ck! 🤯😅
There's also no helper APIs for this on iOS, and it looks like no other Camera framework (not even native Swift/ObjC iOS Camera libraries) support this - they all break when timestamps have a delay (e.g. video stabilization enabled) (or dont even support delays at all) ; so I had to build the thing myself.
Check out this PR and try if it fixes the issue for you; https://github.com/mrousavy/react-native-vision-camera/pull/2948
Thanks! ❤️
I just re-read what you said and it sounds actually intentional - there are situations where a few more frames are being encoded into the video.
This is to ensure the video is longer than the audio, but the video metadata has a flag that specifies the actual duration of the track session - this might cut off a few frames in the start or end.
See AVAssetWriter.startSession(atSourceTime:)
/ AVAssetWriter.endSession(atSourceTime:)
I think this is either fixed now in 4.2.0, or intentional (depending on the comment above). https://github.com/mrousavy/react-native-vision-camera/releases/tag/v4.2.0
Thank you @mrousavy for the detailed comments and the new logic for accurate duration calculation. However, I'm still encountering the issue I mentioned earlier. Let me provide a more detailed explanation:
let successful = assetWriterInput.append(buffer)
returns success, and I am counting the frames written to the file at this point.For example, the count is 184 because assetWriterInput.append(buffer)
was called successfully 184 times. The video file's metadata also reflects 184 frames. However, when I decode the recorded file using the Python script I mentioned in my first comment, it shows 183 frames or sometimes 182 frames. The decoded frame count is always less than the number of frames actually written to the file.
Could you suggest a way to fix this discrepancy? I have even tried excluding frames that are before the video starting timestamp by returning false in the start case of the events (if timestamp < event.timestamp
).
I need the metadata file frame count to match the decoded frame count because I am recording the timestamp of each frame for later video analysis. The timestamps are recorded in a separate JSON file. So, when 184 buffers are appended, the timestamp count is also 184. But with only 183 or 182 decoded frames, there is a mismatch, and it's unclear which frame was dropped or skipped (whether at the start, middle, or end).
Any assistance to resolve this issue would be greatly appreciated. Thanks!
I think this shouldn't be changed in VisionCamera, but rather in your Python script.
VisionCamera just does add a few frames before or after the video to make sure there are no blanks (because if an audio sample comes after a video sample, it will be a blank frame in the resulting video).
So I guess you just need to make sure to decode only the Frames that are actually within the time range of the track duration.
The decoded frames are always fewer than the frames added by the Vision Camera, never more. I have tried using the ffmpeg -i video.mp4 thumb%04d.jpg -hide_banner
command, and the result is the same as with the Python script.
Additionally, the video recorded by Expo Camera does not exhibit any frame discrepancies. I have also tested some random recorded videos from other sources, and none of them show any frame differences.
Do you think this issue can be fixed on the Vision Camera side? Any help would be greatly appreciated.
What's happening?
I'm using vision camera to record video and get timestamp of each frame. It's working fine as it should be. But the issue is when I try to use that video file for analysis at server side, the frames count in video's metadata file and actual encoded frames doesn't match. Encoded frames are always less than the frame count in video's metadata file. As vision camera recording working fine, I'm adding python scrip that I'm using to find the both frame counts.
Reproduceable Code
Relevant log output
Camera Device
Device
iPad Air (4th generation) iOS 17.4.1 (21E236)
VisionCamera Version
3.9.0
Can you reproduce this issue in the VisionCamera Example app?
Yes, I can reproduce the same issue in the Example app here
Additional information