Alignment with paper - Githubissues

tobyclh commented 5 years ago

Hello, thanks for releasing the pytorch version of the code! I have a couple questions that sync this repo with the paper (sorry for the pun

fc7 in the paper is a 256-d vector whereas here the output feature is 1024-d (at lease the pretrained model seems to be), is it a newer/better version of this work or am I looking at the wrong place?
in the file SyncNetInstance.py line 107, there is a *4 applied to the sampling of the audio, I suspect that refers to some sort of stride, however I seem to miss the part in the paper mentioning this stride (perhaps too fundamental?), would you explain what it is?

joonson commented 5 years ago

Hi,

This is an updated version, but the functionality should be the same.
This is because the audio (spectrograms) is sampled at 100Hz, whereas the video is sampled at 25Hz.

tobyclh commented 5 years ago

Thank you for the response!

joonson / syncnet_python