mihaidusmanu / d2-net

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
Other
761 stars 163 forks source link

Changing model architecture when test (inference) #82

Closed ComputerMath closed 2 years ago

ComputerMath commented 3 years ago

Hello

I read the paper and the partition of the code in which the inference works. As the paper and coda says, the inference model change the avg pooling from stride 2 to 1 and use dilated convolution unlike trained model.

In here, I'm curious about two things. 1) How could these ideas be driven? I haven't read any paper or seen model in which the architecture of model changes when the model infers unlike training.

2) Is there any reason Why the model wasn't set to be fixed to test-version model in train time at initial? Does this different architecture between train - test bring better results?

Thanks in advance :)

mihaidusmanu commented 3 years ago

Hey. We were initially looking for ways to obtain better local features using our approach from pre-trained (ImageNet) architectures, which is why we switched to 2x2 average pool w. stride 1 + dilated convolutions. In this way, we keep the same receptive field while increasing the feature map resolution.

Training with dilated convolutions is 4x more memory expensive and the final results in our initial experiments were quite similar so we decided to keep the base architecture for faster and more memory efficient training.

07Agarg commented 2 years ago

Hi @mihaidusmanu , Thanks for the paper and code! There will be an issue when loading the weights of the model trained with different architecture as that of inference time. Isn't it so?

Looking for your response. Thanks again,