mihaidusmanu / d2-net

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
Other
764 stars 163 forks source link

Is depth map really necessary? #45

Closed ChunhuanLin closed 4 years ago

ChunhuanLin commented 4 years ago

Hi, when you make preprocess MegaDepth for training, why don't you just use the 2D points which projected from the matched 3D points as key points? Using depth map, as you did in the paper, will generate more key points, but will also involve key points which are less reliable, such as tree leafages. What's your opinion? Thanks a lot.

mihaidusmanu commented 4 years ago

The protocol you described (sparse 3D model only) will yield SIFT correspondences. By exploiting the MVS depth maps, we get dense correspondences between images. Even though these correspondences might be less accurate, they allow the network to learn to detect other points than SIFT. Moreover, the VGG backbone downsizes the image to 1/8th the resolution during training, so coarse correspondences suffice as supervision.

ChunhuanLin commented 4 years ago

Thanks for your reply!