yerfor / GeneFace

GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code
MIT License
2.52k stars 294 forks source link

Does the May's pretrained model support chinese audio? #230

Open lokvke opened 11 months ago

lokvke commented 11 months ago

hi there, i have a question as the tile says "does the May's pretrained model support chinese audio?", i tried to use May's pretrained model and a chinese audio file, but the output video seems that the lip of May doesn't match the audio.

look forward to your reply

aizhiqi-work commented 10 months ago

一个建议是 landmark3d-sync 是lrs3预训练的,这个唇形对中文的拟合有一些问题。建议你用纯中文数据走一遍geneface全部流程

jack139 commented 10 months ago

一个建议是 landmark3d-sync 是lrs3预训练的,这个唇形对中文的拟合有一些问题。建议你用纯中文数据走一遍geneface全部流程

请教一下,中文有哪些类似lsr3的数据集可以用?因为我找到lrw-1000,但没有全脸,只有嘴部的截图,与lsr还不太一样,无法提取landmark信息。

aizhiqi-work commented 10 months ago

你是对的,我记得lrw-1000算是较大的中文数据。一个建议是换个思路,清华最近开源的一些多模态中文数据可以考虑下,CN-CVS,AV-CNCELEB类似的,规模都蛮大的,有一定参考价值。

jinqiupeter commented 10 months ago

其实预训练的模型对中文的支持还可以,这是我从postnet开始训练的效果:

https://github.com/yerfor/GeneFace/assets/12045814/0c4ddc37-63a1-4609-b7bb-00d876eb2ec8

aizhiqi-work commented 10 months ago

看起来唇形同步好差,其实从指标上英语的唇形同步也不算特别惊艳,不过这个工作还是非常值得follow的

CatherineZhou commented 7 months ago

@jinqiupeter 请教下,克隆声音用的那个模型?感觉声音克隆挺自然的。