ruotianluo / ImageCaptioning.pytorch

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
MIT License
1.43k stars 412 forks source link

Extract FC feature in resnet_utlis.py #63

Open jamiechoi1995 opened 6 years ago

jamiechoi1995 commented 6 years ago

Hi,

I'm curious about the way you extract fc feature from resnet,

why did you use https://github.com/ruotianluo/ImageCaptioning.pytorch/blob/622b6a5ffe9ee599911306b464dfa1ed2a19fa37/misc/resnet_utils.py#L24

instead of

x = self.resnet.avgpool(x) fc = x.view(x.size(0), -1)

as defined in https://github.com/ruotianluo/ImageCaptioning.pytorch/blob/622b6a5ffe9ee599911306b464dfa1ed2a19fa37/misc/resnet.py#L149

ruotianluo commented 6 years ago

because the output feature may not be 7x7

jamiechoi1995 commented 6 years ago

@ruotianluo

I think you means the att features, but what I mean is the fc features,

seems that you use the average conv features of all localtions as fc feature. (similiar to"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering", I think)

But what I thought before is that fc feature is the feature of fully connected layer.

I also see that adaptive_avg_pool2d https://github.com/ruotianluo/ImageCaptioning.pytorch/blob/622b6a5ffe9ee599911306b464dfa1ed2a19fa37/misc/resnet_utils.py#L25 can not only specify the att size but also allow the model to accept images of arbitrary size, good implementation.