Closed Amber-Believe closed 7 months ago
did you filter your data?
您过滤了您的数据了吗?
What do we mean by filtering? So far I have processed the data to fps 25 audio with a sampling rate of 16khz
did you filter your data?
The current data is in Chinese, including 50 different people and 7700 videos; The English data contains 40 people and 5,500 videos. Is it due to the small amount of data or other reasons. Our data has not been processed by syncnet_python. We found that the model syncnet_v2. model is not available. Can you provide it? That's the problem right now
did you filter your data?
The current data is in Chinese, including 50 different people and 7700 videos; The English data contains 40 people and 5,500 videos. Is it due to the small amount of data or other reasons. Our data has not been processed by syncnet_python. We found that the model syncnet_v2. model is not available. Can you provide it? That's the problem right now
syncnet_v2 模型是英文模型,中文不适用,你需要训练一个中文的
how's about your config? lr, bs?
how's about your config? lr, bs?
All Settings are set by default, such as lr=1e-3 when training expert lip-sync discriminator hparams.txt
1e-3 is too large, you can choose 1e-4 or 1e-5
1e-3 is too large, you can choose 1e-4 or 1e-5 Okay, I'll try
1e-3 is too large, you can choose 1e-4 or 1e-5
Thank you very much for your advice. After the lr is adjusted to 1e-5, the loss begins to decrease. What is the appropriate learning rate for wav2lip training? 1e-4?
1e-4 is good
Thank you!
Hi @Amber-Believe. May I ask what dataset are you using? Is it CMLR? LRS-1000? Or a private one?
Hi @Amber-Believe. May I ask what dataset are you using? Is it CMLR? LRS-1000? Or a private one?
@Amber-Believe BTW, is the eval loss below 0.3 after changing lr from 1e-3 to 1e-5? And is there anything else you have done to achieve that?
Hi @primepake I was re-directed to this page from https://github.com/primepake/wav2lip_288x288/issues/97
But I still did not figure out my question. The dataset I am using is LRS2. Because the official wav2lip algorithm use that for training. So I am assuming that should be filtered. And I also randomly checked some audios and videos of the dataset. The wav files are in 16khz sample rate and video files are in 25 fps.
And I would like to ask when do you think the syncnet training would be converged... Will it still be stuck on 0.69 for a long time? (110k steps for me currently..)
Look for your reply. Thanks.
again, how's about your lr? bs? num of gpus?
LR is 1e-4, and BS is 64, 1 RTX A6000 GPU.
@primepake Greetings! I am trying to following the pipeline you proposed here
And may I ask how could you pre-process the video data? Are you using the preprocess code from the official wav2lip code?
did you filter your data?
The current data is in Chinese, including 50 different people and 7700 videos; The English data contains 40 people and 5,500 videos. Is it due to the small amount of data or other reasons. Our data has not been processed by syncnet_python. We found that the model syncnet_v2. model is not available. Can you provide it? That's the problem right now
how much is the average length per video of the videos of the datasets ?
I use Chinese data set to train expert lip-sync discriminator, and the train loss remains at 0.69. Do you have such a situation?How should this situation be resolved