microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 252 forks source link

About the image features dimensions #23

Closed vivid-k closed 4 years ago

vivid-k commented 4 years ago

Hello, Thank you for your great works! When I extract image features, the dimension of the result is 2048, but the dimension of your model is 2054?

liuhl-source commented 4 years ago

Hello, Thank you for your great works! When I extract image features, the dimension of the result is 2048, but the dimension of your model is 2054?

Hi, I have the same question. Do you have any idea?

tjdevWorks commented 4 years ago

Region feature vector obtained from the faster rcnn model, is a 2048 dimensional vector. Additionally a (4 or 6) dimensional region position vector (position means the top left and bottom right corners of detected object and if 6 dimensional it includes height and width too) is used, which is concatenated with the 2048 feature vector to obtain a 2052 / 2054 feature vector.

For reference check the paper (https://arxiv.org/pdf/2004.06165.pdf) page 4-5.

vinson2233 commented 3 years ago

Hi, I use code from https://github.com/airsplay/py-bottom-up-attention/tree/master/demo and the position vector is 4 dimensional. Meaning concatenating it with region feature vector will resulted in 2052 dimensional vector. Can I calculate height = top- bottom and width = right-left to fill the last 2 missing dimensions?

zdxdsw commented 3 years ago

The last 6 dimensions all have values in (0,1). I suppose they are normalized by the actual width and height of the image, right?