xuewyang / Fashion_Captioning

ECCV2020 paper: Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. Code and Data.
Other
85 stars 13 forks source link

Problems about the proposed approch in the paper. #4

Closed LONGRYUU closed 3 years ago

LONGRYUU commented 3 years ago

I got some puzzles while reading your paper.

1.How to get the attribute vector z? More specifically, how to transfer the image features into vector z? In the paper, z is obtained from a feed-forward layer, what functions in pytorch did you use to combine this layer? Linear functions or convolutional layers? There could be several strategies to compress the 3-dimension image features into a vector.

2.In formulation 8, the subscript 1/n is put out of the brackets, is it a typo error? Does it mean β P(1) √(P(2)) or β √(P(1) P(2))?

xuewyang commented 3 years ago
  1. I used average pooling, then a linear function followed by a sigmoid layer. See table 1 of this paper
  2. That is not a typo. I used the latter one.
LONGRYUU commented 3 years ago

So the output of sigmoid layer is a probability distribution over the 990 attributions, is it? And z is the feature vector before the sigmoid layer with size of 990?

xuewyang commented 3 years ago

So the output of sigmoid layer is a probability distribution over the 990 attributions, is it? And z is the feature vector before the sigmoid layer with size of 990?

Yes.

LONGRYUU commented 3 years ago

Are you planning to update your paper on arxiv? I'd like to try some ideas but in lack of the correct scores of your approach. Specifically, the results in Table 2 are not updated so I can not make fair comparison with your approach and other baselines you employed.

xuewyang commented 3 years ago

Are you planning to update your paper on arxiv? I'd like to try some ideas but in lack of the correct scores of your approach. Specifically, the results in Table 2 are not updated so I can not make fair comparison with your approach and other baselines you employed.

Yes, please wait 1 or 2 weeks.