How l2-normalization over feature is implemented ?

hengyuan-hu / bottom-up-attention-vqa

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

GNU General Public License v3.0

754 stars 181 forks source link

How l2-normalization over feature is implemented ? #22

Closed ZhuFengdaaa closed 6 years ago

ZhuFengdaaa commented 6 years ago

this paper states that L2 normalization of the image features is crucial for good performance. However, you just use pool5 data, which is average pooled to become a 2048 vector in generate_tsv.py

However, nether did your repository bottom-up-attention-vqa nor the feature exactor repository bottom-up-attention implement the L2-normaliation. I implemented it at the very beginning of the forward procedure by v = v / torch.norm(v, 2). But the validation score decreased by 0.5.

Can anybody explain it ? Thanks~

hengyuan-hu commented 6 years ago

I vaguely remember that we tried normalize the feature but it decreased the performance. I cannot remember the configuration of those experiments such as whether we used weight norm, and which optimizer we used at that time. Personally I think when you normalize a vector, you may lose some information (i.e. the magnitude). Unlike some cases where features are of vastly different magnitudes, these features are trained for detection and work fine for that task. I actually don't see the necessity for normalization. With that been said, like many problems in deep learning, the answer is not clear.

ZhuFengdaaa commented 6 years ago

Thanks, I close it even though I think it worth further discussion.