Closed langzizhixin closed 9 months ago
added your demo to my repo
Hi @primepake , I would like to provide some comments about the effect. Some of the frames are very accurate, including phonemes like "Sh", "wu", "d", "f"... But some phonemes are not accurate, like "l" "i:".. May I ask if they added more training sample for the bad effect phonemes, the effect could be enhanced further?
Hi @langzizhixin Thanks for sharing the demo video. May I ask if you trained that on only one person video? And how many hours training data did you use?
Look forward your reply. :)
you should train the data that includes these phonemes
you should train the data that includes these phonemes
Yes, that makes sense. Thus I am confusing the video length for training because it might not include the whole phonemes.
你好@langzizhixin 感谢分享演示视频。请问您是否仅在一个人的视频上进行了训练?您使用了多少小时的训练数据?
期待您的回复。:)
It is best to train a generalized large model using a large dataset and iteratively increasing the dataset step by step. Then use a few more minutes of video for fine-tuning training.
added your demo to my repo
Thank you for your excellent work. Wav2lip_ 288 can indeed be trained with high-quality datasets and achieve good results.
btw you can train latest update with sam-unet to get better result
@primepake Hi thank you for your excellent work!
你好@langzizhixin 感谢分享演示视频。请问您是否仅在一个人的视频上进行了训练?您使用了多少小时的训练数据? 期待您的回复。:)
It is best to train a generalized large model using a large dataset and iteratively increasing the dataset step by step. Then use a few more minutes of video for fine-tuning training.
Does Wav2lip_288 is a pre-trained model with a large dataset, and developer just need a increasing train with a few more minutes of video ?
Thank you to the source code author for providing such a great open source project.
https://github.com/primepake/wav2lip_288x288/assets/74521932/a1d6a920-a383-4bd3-a3e3-de4e0e156184