Closed voletiv closed 5 years ago
I will add the pre-processing pipeline to the repository. The numbers suggest that this is not working for your videos, possibly due to the size of your face crop.
I tried your code on delayed versions of your example video "data/example.avi", with delays of 0.25, 0.5, 0.75 and 1. Please explain the outputs:
data/example.avi - AV offset 4, conf 0.447 data/example_audio_delay_0.25.mp4 - AV offset -2, conf 0.410 data/example_audio_delay_0.50.mp4 - AV offset -9, conf 0.477 data/example_audio_delay_0.75.mp4 - AV offset -15, conf 0.493 data/example_audio_delay_1.00.mp4 - AV offset -15, conf 0.096
You can find the delayed videos here: https://drive.google.com/open?id=1N1l1IOCHOSYIOh8pw4pmkHr9gETrB2SK
Videos have been delayed using ffmpeg. Example:
ffmpeg -i example.avi -itsoffset 0.50 -i example.avi -map 0:v -map 1:a example_audio_delay_0.50.mp4
One second delay equates to 25 frame offset, so the results for 0, 0.25, 0.5, 0.75 seconds appear to be consistent. You will need to change the --vshift argument for the 1-second delay, since it only searches within +- 15 frames by default.
i ran the updated code... and using the data/example.avi and using @voletiv 's ffmpeg delay:
AV offset: 3 Min dist: 8.025 Confidence: 8.326
AV offset: -9 Min dist: 6.795 Confidence: 10.619
AV offset: -6 Min dist: 15.443 Confidence: 0.976
Still having a hard time interpreting this...
With a 0.5-second delay, we expect the offset to change by 12 or 13 frames, which is in line with your results. You are seeing very low confidence for 3-second delay (the audio-to-video correlation is not found), because the script does not look for offsets that are so far out.
One second delay equates to 25 frame offset, so the results for 0, 0.25, 0.5, 0.75 seconds appear to be consistent. You will need to change the --vshift argument for the 1-second delay, since it only searches within +- 15 frames by default.
Hi @joonson , can we change the vshift to whatever amount of delay we want to detect? Or is the maximum limit of 15 still there?
Yes, you can change the value of vshift.
Too correct the offset for video/audio:
say, "AV offset -2", the video delay should be 2/25= 0.08. Always put -itsoffset ahead of audio input (instead of video input).
ffmpeg -i example_av_async.avi -itsoffset -0.08 -i example_av_async.avi -map 0:v -map 1:a example_corrected.mp4
Too correct the offset for video/audio:
say, "AV offset -2", the video delay should be 2/25= 0.08. Always put -itsoffset ahead of audio input (instead of video input).
ffmpeg -i example_av_async.avi -itsoffset -0.08 -i example_av_async.avi -map 0:v -map 1:a example_corrected.mp4
@pcgreat @joonson I have 2 issues:
If the original video is of different FPS, is this offset not meaningful (I cannot use this repo), or should I just use the offset and sync-correct with offset/FPS anyway?
On sync-correcting the video, I get a mismatch of audio and video lengths. Is that normal?
Thanks!
I ran your code for different audio delays:
delay = 0.00 seconds - AV offset -2, conf 0.038 delay = 0.25 seconds - AV offset -9, conf 0.048 delay = 0.50 seconds - AV offset -15, conf 0.039 delay = 0.75 seconds - AV offset -4, conf 0.022 delay = 1.00 seconds - AV offset 12, conf 0.029
1) Clearly the delay is not being reflected with much confidence in the results. Is this a work in progress? FYI, the above values were for videos converted from 30fps to 25fps, which had the issue of:
Mismatch between the number of audio and video frames. Type 'cont' to continue.
2) I see that the example video has the full face (1.5 x dlib face_rect) as the input to the lip model. Does this mean the model will only work for faces in the LRW dataset? I am trying with non-LRW faces.