yhw-yhw / SHOW

This is the codebase for SHOW in Generating Holistic 3D Human Motion from Speech [CVPR2023],
Other
219 stars 26 forks source link

Questions about differences between Paper and Code Implementation #18

Closed LuckyOne09 closed 1 year ago

LuckyOne09 commented 1 year ago

Thank you for your thorough and comprehensive work firstly! It is a very solid work. But I found some differences between paper and code default implementation. For the default config, I found that:

  1. You opted not to utilize the silhouette loss.
  2. Instead of employing pymaf-x for hand pose initialization, you continued to use pixie hand pose initialization.
  3. There was no utilization of mediapipe to generate additional keypoints for face mesh supervision.

I am interested in understanding the reasons behind these design choices and the insights behind them. Thank you for your time and kindness!

lithiumice commented 1 year ago

3: MediaPipe dose can often improve face mesh reconstruction quality, but its results will have a significant error if the video resolution is low. 1,2: silhouette loss and PyMaf-X are time-consuming, in the paper they should be opened, which is an omission of explanation.

LuckyOne09 commented 1 year ago

Thank you for your helpful reply! So, if I understand correctly, in order to achieve better results, such as more accurate hand poses, I should enable silhouette loss and PyMaf-X initialization, right?

lithiumice commented 1 year ago

yes. I will fix the config in code implementation later.

LuckyOne09 commented 1 year ago

OK, thank you for your reply!