Hi Zhang! First of all, thank you very much for such an exciting paper. I have a question regarding architecture and experiments. How will the model perform if I train GeometryPredictor and FaceReenactor on different datasets?
If the answer is 'poorly' what is needed to make it look better? something like "Unified Landmark Converter" from your other paper on FReeNet?
(The specific use case I m trying to implement is when a person speaks in one language(not English), and I am passing an English speech and pose/blink of the original video to the model to make a person speak in English.)
You can train both nets on other datasets as long as they have audio information. Note that the landmark, pose and blink information must be obtained in advance (I use the API: https://www.faceplusplus.com.cn/), and the backgroud should not change strongly.
"Unified Landmark Converter" is designed for performing geometric transformations between multiple people using one model that has better practicality.
APB2Face should meet your needs because we model the raw audio signal rather than a special language. You can construct your datasets and try APB2Face.
Good luck~
Hi Zhang! First of all, thank you very much for such an exciting paper. I have a question regarding architecture and experiments. How will the model perform if I train GeometryPredictor and FaceReenactor on different datasets?
If the answer is 'poorly' what is needed to make it look better? something like "Unified Landmark Converter" from your other paper on FReeNet?
(The specific use case I m trying to implement is when a person speaks in one language(not English), and I am passing an English speech and pose/blink of the original video to the model to make a person speak in English.)