First of all, thank you for conducting the excellent research and for sharing the code.
I followed the instructions in the README, and successfully loaded the pretrained model.
To test the model, I have downloaded images of other people like Will Smith, changed the image file name to face2.png, but it seems to keep producing the same voice of a female.
Is there any additional work required to perform TTS conditioned on human face images?
First of all, thank you for conducting the excellent research and for sharing the code.
I followed the instructions in the README, and successfully loaded the pretrained model. To test the model, I have downloaded images of other people like Will Smith, changed the image file name to face2.png, but it seems to keep producing the same voice of a female.
Is there any additional work required to perform TTS conditioned on human face images?