joonson / syncnet_python

Out of time: automated lip sync in the wild
MIT License
681 stars 149 forks source link

Explanation of outputs #2

Closed voletiv closed 5 years ago

voletiv commented 6 years ago

I ran your code for different audio delays:

delay = 0.00 seconds - AV offset -2, conf 0.038 delay = 0.25 seconds - AV offset -9, conf 0.048 delay = 0.50 seconds - AV offset -15, conf 0.039 delay = 0.75 seconds - AV offset -4, conf 0.022 delay = 1.00 seconds - AV offset 12, conf 0.029

1) Clearly the delay is not being reflected with much confidence in the results. Is this a work in progress? FYI, the above values were for videos converted from 30fps to 25fps, which had the issue of: Mismatch between the number of audio and video frames. Type 'cont' to continue.

2) I see that the example video has the full face (1.5 x dlib face_rect) as the input to the lip model. Does this mean the model will only work for faces in the LRW dataset? I am trying with non-LRW faces.

joonson commented 6 years ago

I will add the pre-processing pipeline to the repository. The numbers suggest that this is not working for your videos, possibly due to the size of your face crop.

voletiv commented 6 years ago

I tried your code on delayed versions of your example video "data/example.avi", with delays of 0.25, 0.5, 0.75 and 1. Please explain the outputs:

data/example.avi - AV offset 4, conf 0.447 data/example_audio_delay_0.25.mp4 - AV offset -2, conf 0.410 data/example_audio_delay_0.50.mp4 - AV offset -9, conf 0.477 data/example_audio_delay_0.75.mp4 - AV offset -15, conf 0.493 data/example_audio_delay_1.00.mp4 - AV offset -15, conf 0.096

You can find the delayed videos here: https://drive.google.com/open?id=1N1l1IOCHOSYIOh8pw4pmkHr9gETrB2SK

Videos have been delayed using ffmpeg. Example: ffmpeg -i example.avi -itsoffset 0.50 -i example.avi -map 0:v -map 1:a example_audio_delay_0.50.mp4

joonson commented 6 years ago

One second delay equates to 25 frame offset, so the results for 0, 0.25, 0.5, 0.75 seconds appear to be consistent. You will need to change the --vshift argument for the 1-second delay, since it only searches within +- 15 frames by default.

taewookim commented 6 years ago

i ran the updated code... and using the data/example.avi and using @voletiv 's ffmpeg delay:

no delay

AV offset: 3 Min dist: 8.025 Confidence: 8.326

0.5s delay

AV offset: -9 Min dist: 6.795 Confidence: 10.619

3s delay

AV offset: -6 Min dist: 15.443 Confidence: 0.976

Still having a hard time interpreting this...

joonson commented 6 years ago

With a 0.5-second delay, we expect the offset to change by 12 or 13 frames, which is in line with your results. You are seeing very low confidence for 3-second delay (the audio-to-video correlation is not found), because the script does not look for offsets that are so far out.

abhisheksgumadi commented 5 years ago

One second delay equates to 25 frame offset, so the results for 0, 0.25, 0.5, 0.75 seconds appear to be consistent. You will need to change the --vshift argument for the 1-second delay, since it only searches within +- 15 frames by default.

Hi @joonson , can we change the vshift to whatever amount of delay we want to detect? Or is the maximum limit of 15 still there?

joonson commented 5 years ago

Yes, you can change the value of vshift.

pcgreat commented 3 years ago

Too correct the offset for video/audio:

say, "AV offset -2", the video delay should be 2/25= 0.08. Always put -itsoffset ahead of audio input (instead of video input).

ffmpeg -i example_av_async.avi -itsoffset -0.08 -i example_av_async.avi -map 0:v -map 1:a example_corrected.mp4

ashok-arjun commented 1 year ago

Too correct the offset for video/audio:

say, "AV offset -2", the video delay should be 2/25= 0.08. Always put -itsoffset ahead of audio input (instead of video input).

ffmpeg -i example_av_async.avi -itsoffset -0.08 -i example_av_async.avi -map 0:v -map 1:a example_corrected.mp4

@pcgreat @joonson I have 2 issues:

  1. If the original video is of different FPS, is this offset not meaningful (I cannot use this repo), or should I just use the offset and sync-correct with offset/FPS anyway?

  2. On sync-correcting the video, I get a mismatch of audio and video lengths. Is that normal?

Thanks!