Closed sabeehalam closed 2 years ago
The text encoder is a BiLSTM that has been jointly trained through DAMSM loss [1] as many GAN-based models. You can also try other text encoders (Bert, CLIP).
[1] Xu T, Zhang P, Huang Q, et al. Attngan: Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1316-1324.
How did you form the text encoder. Also what are the contents of meta data files?
Hey did u try solving with different encoders?
How did you form the text encoder. Also what are the contents of meta data files?