Closed hjzzju closed 3 years ago
Hello, For GRID dataset, you should preprocess GRID dataset following multispeaker branch with the original lip2wav repository. Then you can simply run the code by simply adjusting 91-96 lines in hparams.py. For example,
T = 40
overlap = 10
mel_step_size = 160
mel_overlap = 40
img_size = 96
fps = 25
Be careful of setting hop size and window size when preprocessing GRID dataset since the video is 25 fps. I have not tried TIMIT dataset due to the website shutdown.
Thank you for your quick reply, it helps a lot
Hello, For GRID dataset, you should preprocess GRID dataset following multispeaker branch with the original lip2wav repository. Then you can simply run the code by simply adjusting 91-96 lines in hparams.py. For example,
T = 40 overlap = 10 mel_step_size = 160 mel_overlap = 40 img_size = 96 fps = 25
Be careful of setting hop size and window size when preprocessing GRID dataset since the video is 25 fps. I have not tried TIMIT dataset due to the website shutdown.
Hii, Could you tell me how to calculate the mel_step_size by T and fps?do you have any expression for mel_step_size?Thank you so much
Thank you for the great project. I want to try this code on TIMIT and GRID dataset, do i need some change to run this code?