Closed ZhuFengdaaa closed 6 years ago
I vaguely remember that we tried normalize the feature but it decreased the performance. I cannot remember the configuration of those experiments such as whether we used weight norm, and which optimizer we used at that time. Personally I think when you normalize a vector, you may lose some information (i.e. the magnitude). Unlike some cases where features are of vastly different magnitudes, these features are trained for detection and work fine for that task. I actually don't see the necessity for normalization. With that been said, like many problems in deep learning, the answer is not clear.
Thanks, I close it even though I think it worth further discussion.
this paper states that L2 normalization of the image features is crucial for good performance. However, you just use pool5 data, which is average pooled to become a 2048 vector in generate_tsv.py
However, nether did your repository
bottom-up-attention-vqa
nor the feature exactor repositorybottom-up-attention
implement the L2-normaliation. I implemented it at the very beginning of the forward procedure byv = v / torch.norm(v, 2)
. But the validation score decreased by 0.5.Can anybody explain it ? Thanks~