primepake / wav2lip_288x288

MIT License
560 stars 143 forks source link

Thank you to the source code author for providing such a great open source project. Very good. #89

Closed langzizhixin closed 9 months ago

langzizhixin commented 9 months ago

Thank you to the source code author for providing such a great open source project.

https://github.com/primepake/wav2lip_288x288/assets/74521932/a1d6a920-a383-4bd3-a3e3-de4e0e156184

ghost commented 9 months ago

added your demo to my repo

Nyquist0 commented 9 months ago

Hi @primepake , I would like to provide some comments about the effect. Some of the frames are very accurate, including phonemes like "Sh", "wu", "d", "f"... But some phonemes are not accurate, like "l" "i:".. May I ask if they added more training sample for the bad effect phonemes, the effect could be enhanced further?

Nyquist0 commented 9 months ago

Hi @langzizhixin Thanks for sharing the demo video. May I ask if you trained that on only one person video? And how many hours training data did you use?

Look forward your reply. :)

ghost commented 9 months ago

you should train the data that includes these phonemes

Nyquist0 commented 9 months ago

you should train the data that includes these phonemes

Yes, that makes sense. Thus I am confusing the video length for training because it might not include the whole phonemes.

langzizhixin commented 9 months ago

你好@langzizhixin 感谢分享演示视频。请问您是否仅在一个人的视频上进行了训练?您使用了多少小时的训练数据?

期待您的回复。:)

It is best to train a generalized large model using a large dataset and iteratively increasing the dataset step by step. Then use a few more minutes of video for fine-tuning training.

langzizhixin commented 9 months ago

added your demo to my repo

Thank you for your excellent work. Wav2lip_ 288 can indeed be trained with high-quality datasets and achieve good results.

ghost commented 9 months ago

btw you can train latest update with sam-unet to get better result

Oyiyi commented 9 months ago

@primepake Hi thank you for your excellent work!

dxLu commented 1 week ago

你好@langzizhixin 感谢分享演示视频。请问您是否仅在一个人的视频上进行了训练?您使用了多少小时的训练数据? 期待您的回复。:)

It is best to train a generalized large model using a large dataset and iteratively increasing the dataset step by step. Then use a few more minutes of video for fine-tuning training.

Does Wav2lip_288 is a pre-trained model with a large dataset, and developer just need a increasing train with a few more minutes of video ?