naver / r2d2

Other
461 stars 86 forks source link

Doubts about reliability performance in custom image #18

Closed ShrutheeshIR closed 4 years ago

ShrutheeshIR commented 4 years ago

Hello. Thanks for releasing the code. It works amazingly well on completely new data as well!

While testing it out on difficult views of a building on multiple views, this was the output that I got, (after matching features)

Screenshot from 2020-06-22 12-41-36

Focusing purely on the building (mostly on the right hand side of each image), surprisingly, the features have been matched extremely well. However, since the patterns of the building are repeated, according to Figure 1 of your paper, (and the reliability score) shouldn't the feature extraction stage prevent finding such repeatable patterns in the image?

While these extremely good matches does suit my purpose, I was wondering if this is infact going against what is stated in the paper, or if this behaviour is to be expected. If this is to be expected, could you elaborate a little more on the reliability of keypoints from this perspective that I have put forth?

Thanks

Note: I am not overly worried about the other 'wrong' matches in the image as of now, since I understand that they've not been trained on anything remotely similar to such views/images, but the building reliability is my primary query.

humenbergerm commented 4 years ago

Hi!

Thanks for your feedback. We try to learn which areas are well suited for matching. Thus, we want the network to decide which areas are too repeatable and which contain enough (local) information to match them reliably. For human beings, it is quite difficult to see all the variations (e.g. fine texture on a wall) a feature extraction method can cope with. In your case, it just means that the network found some areas of the house facade reliable. And to my understanding, it was correct in most of the cases.

In this picture, you can see what happens if you put a "perfect" repetitive pattern in images.

image Best, Martin

ShrutheeshIR commented 4 years ago

Hello @humenbergerm thank you for the illustration.

Please correct me if I am wrong, from what you explain, essentially, to the naked eye it seems like the building patterns are non-discriminative, but at a finer 'pixel/sub-pixel level' they are discriminative, hence the feature extractor did pick them up.

humenbergerm commented 4 years ago

Yes, that's what I meant. Best, Martin

ShrutheeshIR commented 4 years ago

Thank you for the clarifications.