文中:Our backbone is built on a pretrained HuBERT model and a ResNet1D network, which preserves high-frequency details
of facial movements. During implementation, our backbone synthesizes one second of facial animations with 30 fps in only 0.007
seconds.
文中:Our backbone is built on a pretrained HuBERT model and a ResNet1D network, which preserves high-frequency details of facial movements. During implementation, our backbone synthesizes one second of facial animations with 30 fps in only 0.007 seconds.
谢谢!