mihaidusmanu / d2-net

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
Other
764 stars 163 forks source link

Question about calculating the local max score. #13

Closed XuyangBai closed 4 years ago

XuyangBai commented 4 years ago

Hi, Thanks for your sharing. In your paper, the definition of soft local max score is

屏幕快照 2019-07-31 下午3 30 36

But in your implementation I found

    def forward(self, batch):
        b = batch.size(0)

        batch = F.relu(batch)

        max_per_sample = torch.max(batch.view(b, -1), dim=1)[0]
        exp = torch.exp(batch / max_per_sample.view(b, 1, 1, 1))
        sum_exp = (
            self.soft_local_max_size ** 2 *
            F.avg_pool2d(
                F.pad(exp, [self.pad] * 4, mode='constant', value=1.),
                self.soft_local_max_size, stride=1
            )
        )
        local_max_score = exp / sum_exp

which means that you first normalize the feature vector by dividing the max of the whole image. But I am not clear why this normalization should be done and why you does not do this kind of normalization when calculaing the channel selection score ?

mihaidusmanu commented 4 years ago

The normalization is only done in order to avoid numerical issues (overflow) when computing the exponential - when using an architecture with BatchNorm, I don't think the normalization would be necessary. However, in this version of VGG, the activations can be quite large (I have seen up to a few thousands).

The normalization is not important when computing the channel selection score because it wouldn't make any difference ((A / MAX) / (B / MAX) = A / B).

XuyangBai commented 4 years ago

Thank you, that makes sense to me :)