Questions about the network inputs and output

Hi Chris,

Thanks for contacting! I wouldn't call the input a sequence because the 5 input images does not have any temporal relationship between them. Instead, they are 5 images with different mouth openness, sorted from least open (basically neutral expression) to most open mouth. As for landmark inputs, it is the difference between target landmark and each input image's landmark, in a format of 2xWxH matrix, where 2 stands for x and y coordinates. The output, correct, is the desired (or target) face image of that person.

Regards, Kevin

wangfanChris notifications@github.com 于2020年10月21日周三上午2:39写道：

十分感谢您这篇文章做出的杰出贡献，但是我仍然有个小疑问希望您能解答：

按照您文章的描述，测试时候的输入是：人物A的视频帧序列 +人物A的特定表情T的landmark，输出是人物A的特定表情T图片。请问是这样的吗？

Thanks very much for your outstanding contribution, but I still have a small question I hope you can answer:

According to your paper, the input during the test is: a sequence frames person A's video + the landmark t of person A's specific expression, and the output is a picture I_o corresponding to t. Is this the case?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kgu3/FLNet_AAAI2020/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE42PIY4RI4A62QLDINI6LSL2FZPANCNFSM4SZMKT4Q .

kgu3 / FLNet_AAAI2020

Questions about the network inputs and output #3