mihaidusmanu / d2-net

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
Other
761 stars 163 forks source link

Training on custom dataset is detecting wrong correspondences #68

Closed UditSinghParihar closed 3 years ago

UditSinghParihar commented 3 years ago

Hello Sir,

  1. I am trying to train D2Net on my custom dataset based on issue. But after achieving a loss of 0.0003, correspondences still seem to be wrong. So I would like to clear some doubts regarding the nature of the input of training data, as it is not clear in the paper. Are my following assumptions correct regarding the training data format:
    1. Left-hand coordinate system (Z forward, Y Down, X Right) is used for poses.
    2. pose1: World wrt to camera 1.
    3. Depth file stores values in meters.
    4. Is there any way to visualize whether my input data format is correct like visualizing warping function or visualize GT correspondences.
  2. Loss is going to 0.0003 after 10 epochs of training on d2_tf.pth (have to add 1e-5 at 5 places to avoid NaN values for loss). But correspondences obtained are incorrect.
  3. Our dataset contains image pairs from 180 degrees opposite and we would like to recognize correspondences on a textured floor. Is it possible to train for 180 degrees opposite correspondences with data augmentation as mentioned in the issue?
  4. The sample image pair from the dataset can be seen here.
mihaidusmanu commented 3 years ago

Hello. The format that you are describing seems correct to me. You can check if the ground-truth correspondences are correct by passing the argument --plot to the training script - this will plot the images with dense correspondences as well as the detection scores.

UditSinghParihar commented 3 years ago
  1. Thanks for mentioning about the debugging tool --plot. It seems my training data format is correct as I can see the diagonal type correspondences which are expected in opposite view images. I also added a small code to visualize the correspondences in your '--plot' code block and the output looks like this for training data.
  2. For the problem of D2Net not able to learn the correspondences after training on 180 degrees opposite image viewpoints, it seems like my ground truth poses are not precise and have some translation error in them as can be seen in the image I shared above. I would try to train D2Net on synthetic dataset first for opposite viewpoint correspondences to validate my hypothesis and I would be having correct ground-truth poses information there.
UditSinghParihar commented 3 years ago

Hi Sir,

  1. I have trained D2Net with correct ground-truth correspondences and loss has gone down to 0.0005, which might be implying that learning is taking place, but during inference on training dataset itself, I am getting wrong correspondences. Dense ground-truth correspondences, inference on training dataset image, and loss log can be seen here.
  2. I have trained on d2_ots.pth weights with a learning rate of 0.0003 for 10 epochs.
  3. During testing, I am only using skimage.feature.match_descriptors without RANSAC, with RANSAC I getting very few correspondences (7-8 correspondences).
  4. My dataset contains image pairs from opposite robot viewpoints (180 degrees) and is taken on the gazebo environment.
  5. Could you tell the reason for the discrepancy between low training loss and incorrect matches during inference or some way to debug?

Thanks

mihaidusmanu commented 3 years ago

The loss value is way too low (the least I was able to get on real-world datasets was 0.3-0.4 if I recall correctly). I suspect that the training data is too hard and the network converges to a trivial solution (such as not detecting anything or having the same descriptor across the board). Moreover, I don't think that convolutions are adequate for opposite viewpoints due to the lack of strong rotation invariance.