question about the paper

taoxugit / AttnGAN

MIT License

1.34k stars 420 forks source link

Hi, professor, I am so exciting about the result of you paper, and the idear inspire my inspiration a lot. I think it is an awesome work. But I still have a problem about this paper. At first, we will pretrain the DAMSM to get the text encoder. I think this step will make the word feature from text enconder to get close to the sub-region feature of the image from the image encoder. But I get confuse. In the begining, the word feature get from the text encoder will be random without training, how can we make the word feature to match the right sub-region? For example, if the word feature of the word 'bird' is close to the feature of sub-region 'tree' at first step without training, then the word 'bird' will match the sub-region 'tree' step by step while pretraining the DAMSM. It seems not correct. But the result is so amazing. I don't konw if i understand it in the right way. I am grateful if you could answer the question. Tanks

Hi, professor, I am so exciting about the result of you paper, and the idear inspire my inspiration a lot. I think it is an awesome work. But I still have a problem about this paper. At first, we will pretrain the DAMSM to get the text encoder. I think this step will make the word feature from text enconder to get close to the sub-region feature of the image from the image encoder. But I get confuse. In the begining, the word feature get from the text encoder will be random without training, how can we make the word feature to match the right sub-region? For example, if the word feature of the word 'bird' is close to the feature of sub-region 'tree' at first step without training, then the word 'bird' will match the sub-region 'tree' step by step while pretraining the DAMSM. It seems not correct. But the result is so amazing. I don't konw if i understand it in the right way. I am grateful if you could answer the question. Tanks

I was wondering the same thing. There are nothing about training RNN in the code if i'm not mistaken. Have you found an answer for this?

taoxugit / AttnGAN

question about the paper #2