tobran / DF-GAN

[CVPR2022 oral] A Simple and Effective Baseline for Text-to-Image Synthesis
Other
297 stars 67 forks source link

How did you pretrain the text encoder? #9

Closed sumin1125 closed 2 years ago

sumin1125 commented 3 years ago

What did you predict during pretraining and how did loss do it?

mertbozkir commented 3 years ago

I am looking this question's answer :D

tobran commented 2 years ago

The text encoder is a BiLSTM that has been jointly trained through DAMSM loss [1] as many GAN-based models. You can also try other text encoders (Bert, CLIP).

[1] Xu T, Zhang P, Huang Q, et al. Attngan: Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1316-1324.