joonson / syncnet_python

Out of time: automated lip sync in the wild
MIT License
681 stars 149 forks source link

Different results between demo_syncnet.py and run_syncnet.py #8

Open leeeeeeo opened 6 years ago

leeeeeeo commented 6 years ago

When using python demo_syncnet.py --videofile data/example.avi --tmp_dir /path/to/temp/directory, I get the same result as: AV offset: 4 Min dist: 6.742 Confidence: 10.447

But, when using run_syncnet.py, I get different result from above: AV offset: 2 Min dist: 7.093 Confidence: 9.238

Why is that? Thank you!

leeeeeeo commented 6 years ago

And I have another question: run_sync.py can output AV offset, Min dist and Confidence, but how to recover video without offsets?

offsets.txt are generated from run_sync.py, but it's never used after then.

abhisheksgumadi commented 5 years ago

Any update on this please?

joonson commented 5 years ago

run_pipeline.py runs the face tracking script and re-encodes the video using ffmpeg. I suspect that the re-encoding process is introducing an offset.

abhisheksgumadi commented 5 years ago

Hi @joonson , I also observed that the output of the Face Detector which produces the cropped video introduces a slight delay between the audio and the video. I know that the audio and the video do not have any delay in their original version where the face is not cropped.

Any idea which part of the code might be doing this as the results definitely get affected? With some guidance I am happy to debug it.

Thanks.

Olivialovecode commented 3 years ago

Hi, I also get results offset: 4, by running demo script. But I cannot tell what is the unit of the offset. Is it milliseconds?

XinBow99 commented 7 months ago

Hi, I also get results offset: 4, by running demo script. But I cannot tell what is the unit of the offset. Is it milliseconds?

looks like frame index: offset < 0, the audio is faster than the video frame. offset > 0, the video is faster than the audio frame.