joannahong / Lip2Wav-pytorch

a PyTorch implementation of Lip2Wav
48 stars 11 forks source link

Do you try this code on GRID or TIMIT? #3

Closed hjzzju closed 3 years ago

hjzzju commented 3 years ago

Thank you for the great project. I want to try this code on TIMIT and GRID dataset, do i need some change to run this code?

joannahong commented 3 years ago

Hello, For GRID dataset, you should preprocess GRID dataset following multispeaker branch with the original lip2wav repository. Then you can simply run the code by simply adjusting 91-96 lines in hparams.py. For example,

T = 40
overlap = 10
mel_step_size = 160
mel_overlap = 40 
img_size = 96
fps = 25

Be careful of setting hop size and window size when preprocessing GRID dataset since the video is 25 fps. I have not tried TIMIT dataset due to the website shutdown.

hjzzju commented 3 years ago

Thank you for your quick reply, it helps a lot

Mortyzhou-Shef-BIT commented 1 year ago

Hello, For GRID dataset, you should preprocess GRID dataset following multispeaker branch with the original lip2wav repository. Then you can simply run the code by simply adjusting 91-96 lines in hparams.py. For example,

T = 40
overlap = 10
mel_step_size = 160
mel_overlap = 40 
img_size = 96
fps = 25

Be careful of setting hop size and window size when preprocessing GRID dataset since the video is 25 fps. I have not tried TIMIT dataset due to the website shutdown.

Hii, Could you tell me how to calculate the mel_step_size by T and fps?do you have any expression for mel_step_size?Thank you so much