joonson / syncnet_python

Out of time: automated lip sync in the wild
MIT License
682 stars 150 forks source link

Alignment with paper #11

Closed tobyclh closed 5 years ago

tobyclh commented 5 years ago

Hello, thanks for releasing the pytorch version of the code! I have a couple questions that sync this repo with the paper (sorry for the pun

  1. fc7 in the paper is a 256-d vector whereas here the output feature is 1024-d (at lease the pretrained model seems to be), is it a newer/better version of this work or am I looking at the wrong place?
  2. in the file SyncNetInstance.py line 107, there is a *4 applied to the sampling of the audio, I suspect that refers to some sort of stride, however I seem to miss the part in the paper mentioning this stride (perhaps too fundamental?), would you explain what it is?
joonson commented 5 years ago

Hi,

  1. This is an updated version, but the functionality should be the same.
  2. This is because the audio (spectrograms) is sampled at 100Hz, whereas the video is sampled at 25Hz.
tobyclh commented 5 years ago

Thank you for the response!