joonson / syncnet_python

Out of time: automated lip sync in the wild
MIT License
681 stars 149 forks source link

Want to grab where and whose the speech start and end #17

Closed roshideen closed 3 years ago

roshideen commented 5 years ago

Hi, is it possible to extract what time (or where) the speech of each speaker start and end? I want to extract speech of each speaker so it needs to know when the speech matched to the speakers and end.

joonson commented 5 years ago

Hi, you can use the frame-wise confidence ('fconfm' inside SyncNetInstance.py) and set a threshold. This is the frame number, so you decide the frame index by 25 to get the time in seconds. To make datasets such as LRS and VoxCeleb, we used thresholds of 3 to 4.