rpautrat / SuperPoint

Efficient neural feature detector and descriptor
MIT License
1.85k stars 414 forks source link

Trying to replicate sp_v6 and descriptor loss #288

Closed martinarroyo closed 1 year ago

martinarroyo commented 1 year ago

[This is somewhat related to #287, but I'll open another issue so as not to pollute the other issue]

Hi @rpautrat, thanks for this work and also for providing support for the repo! I am trying to reproduce the results that you report in the README on HPatches as a first step towards making some changes to the model. In order to save some time, I labeled the COCO dataset using the pretrained model listed in the README (MagicPoint (COCO)) and launched a training with the superpoint_coco.yaml config in its current state at HEAD. I had to make minor modifications to the codebase to get it to work in my infra (mostly I/O) but there should be no changes that affect training. I noticed that the negative and positive distances as reported in TensorBoard oscillate within a very small range of values (~[1e-5-1e-7]) and this got me worried. This seems strange based on the values reported in https://github.com/rpautrat/SuperPoint/issues/277#issuecomment-1301836238. For reference, here is how it looks on my current training (my machine restarted so the graphs look a bit funny, apologies for that):

image

Precision and recall are also much lower than those in the sp_v6 log, where recall goes up to 0.6, I can only get it to ~0.37.

I looked into the yaml file in the sp_v6 tarfile and noticed that the loss weights seemed to be adapted for the 'unnormalized' descriptors, so I reverted the changes introduced in 95d1cfd. This helps with the distances (the values are ~0.03 and 0.02 for positive and negative):

image

But recall is still quite low (~0.38).

I also evaluated the model with normalization on HPatches and the results look reasonable. For comparison, I also loaded the sp_v6 checkpoint and ran the same evaluation:

Viewpoint changes Illumination changes
Mine sp_v6 ckpt Claimed Mine sp_v6 ckpt Claimed
0.613 0.645 0.674 0.630 0.655 0.662
Illumination Viewpoint All
Mine sp_v6 ckpt Claimed Mine sp_v6 ckpt Claimed Mine sp_v6 ckpt Claimed
e=1 0.460 0.477 0.483
e=3 0.940 0.936 0.965 0.650 0.654 0.712 0.793 0.793 0.836
e=5 0.894 0.881 0.91

The change in quality is not too bad, but I am still concerned that the numbers are always slightly below the ones reported in the README of this repository, so I would like to ask the following:

Thanks a lot in advance for your help, much appreciated!

rpautrat commented 1 year ago

Hi, I can try to help you replicate the original results, but up to a limit only. This work is indeed more than 4 years old, and I don't remember all the details of the experiments anymore. But regarding your three points:

I hope this can be of some help to you.

martinarroyo commented 1 year ago

Hi, thanks for the quick reply. I mixed up the links in my message and meant indeed the one pointing to mp_synth-v11_ha1_trained, sorry for the confusion.

I compared the logs for the distances and the rest of the metrics. They look similar, however, I had not noticed before that the detector loss is actually much higher than for sp_v6 (green is my experiment, orange sp_v6). Could this mean that I need to tune $\lambda$? image.

I'll try to also evaluate the Magic Leap SP implementation to see if the discrepancy is similar.

I think this more or less answers my questions. If you could comment on the discrepancy of the detector loss that would be great. I will close the issue once I run the Magic Leap model.

rpautrat commented 1 year ago

Indeed, you may want to tune $\lambda$ to better balance the descriptor and detector losses. The latter seems a bit too high in comparison.

ericzzj1989 commented 1 year ago

Since issue is somewhat related to my issue https://github.com/rpautrat/SuperPoint/issues/287#issue-1534493298, would you @martinarroyo please mind explaining and describing what changes you have made in the code for your results?

martinarroyo commented 1 year ago

Apologies for the belated response. My changes were minimal, I only made some modifications to the I/O logic so that it would work in my infrastructure as well as fixing some imports that were not working on my setup. The training logic was unaltered.

shreyasr-upenn commented 6 months ago

HI @rpautrat , I have been getting negative distances as zero in every step. Is this normal? What might have occurred? Same hyperparameter values as you have used.

rpautrat commented 5 months ago

Hi, having a zero negative loss is not impossible, but surprising. If you look at its definition here: https://github.com/rpautrat/SuperPoint/blob/361799f4b6f0252b9968470a70b529f2d8b27911/superpoint/models/utils.py#L123, it is obtained as a hinge loss max(0, desc_distance - m). So if the desc_distance is lower than the margin m (set to 0.2 by default), the negative loss becomes 0, which means that the model was perfectly able to distinguish different descriptors.

However, getting 0 at every steps seems a bit fishy and to good to be true. I would expect it to be positive for at least a few samples. Maybe you can try to plot a few values in the link above to understand what is happening. Checking the positive loss would also be interesting.

shreyasr-upenn commented 5 months ago

image image I have done the equivalent of this code block in Pytorch: https://github.com/rpautrat/SuperPoint/blob/361799f4b6f0252b9968470a70b529f2d8b27911/superpoint/models/utils.py#L140 ` positive_sum = torch.sum(valid_masklambda_ds*positive_dist) / valid_mask_norm

negative_sum = torch.sum(valid_mask*(1-s)*negative_dist) / valid_mask_norm`
rpautrat commented 5 months ago

I would suggest printing a few values to debug your code and understand why the negative loss becomes zero in your case. This sounds to good to be true.