Does the May's pretrained model support chinese audio?

yerfor / GeneFace

GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code

MIT License

2.52k stars 294 forks source link

Does the May's pretrained model support chinese audio? #230

Open lokvke opened 11 months ago

lokvke commented 11 months ago

hi there, i have a question as the tile says "does the May's pretrained model support chinese audio?", i tried to use May's pretrained model and a chinese audio file, but the output video seems that the lip of May doesn't match the audio.

look forward to your reply

aizhiqi-work commented 10 months ago

一个建议是 landmark3d-sync 是lrs3预训练的，这个唇形对中文的拟合有一些问题。建议你用纯中文数据走一遍geneface全部流程

jack139 commented 10 months ago

一个建议是 landmark3d-sync 是lrs3预训练的，这个唇形对中文的拟合有一些问题。建议你用纯中文数据走一遍geneface全部流程

请教一下，中文有哪些类似lsr3的数据集可以用？因为我找到lrw-1000，但没有全脸，只有嘴部的截图，与lsr还不太一样，无法提取landmark信息。

aizhiqi-work commented 10 months ago

你是对的，我记得lrw-1000算是较大的中文数据。一个建议是换个思路，清华最近开源的一些多模态中文数据可以考虑下，CN-CVS，AV-CNCELEB类似的，规模都蛮大的，有一定参考价值。

jinqiupeter commented 10 months ago

其实预训练的模型对中文的支持还可以，这是我从postnet开始训练的效果：

https://github.com/yerfor/GeneFace/assets/12045814/0c4ddc37-63a1-4609-b7bb-00d876eb2ec8

aizhiqi-work commented 10 months ago

看起来唇形同步好差，其实从指标上英语的唇形同步也不算特别惊艳，不过这个工作还是非常值得follow的

CatherineZhou commented 7 months ago

@jinqiupeter 请教下，克隆声音用的那个模型？感觉声音克隆挺自然的。