wuhaozhe / audio2face_mm2023

39 stars 3 forks source link

文章里写的推理时间0.007s是指HuBERT 和 ResNet1D总的时间吗? #4

Open zgyh001 opened 8 months ago

zgyh001 commented 8 months ago

文中:Our backbone is built on a pretrained HuBERT model and a ResNet1D network, which preserves high-frequency details of facial movements. During implementation, our backbone synthesizes one second of facial animations with 30 fps in only 0.007 seconds.

  1. 是指HuBERT 和 ResNet1D加一起的时间吗?
  2. 这个是在什么硬件上的速度啊?

谢谢!

wuhaozhe commented 8 months ago

您好,指的是加在一起的时间。要把mode设置成eval, with torch.no_grad():。在2080ti上的速度。 具体计算方式是: 在测试集上每一个测试样例都跑一遍的总时间/测试集的总时长。