Trying to replicate sp_v6 and descriptor loss

martinarroyo commented 1 year ago

[This is somewhat related to #287, but I'll open another issue so as not to pollute the other issue]

Hi @rpautrat, thanks for this work and also for providing support for the repo! I am trying to reproduce the results that you report in the README on HPatches as a first step towards making some changes to the model. In order to save some time, I labeled the COCO dataset using the pretrained model listed in the README (MagicPoint (COCO)) and launched a training with the superpoint_coco.yaml config in its current state at HEAD. I had to make minor modifications to the codebase to get it to work in my infra (mostly I/O) but there should be no changes that affect training. I noticed that the negative and positive distances as reported in TensorBoard oscillate within a very small range of values (~[1e-5-1e-7]) and this got me worried. This seems strange based on the values reported in https://github.com/rpautrat/SuperPoint/issues/277#issuecomment-1301836238. For reference, here is how it looks on my current training (my machine restarted so the graphs look a bit funny, apologies for that):

Precision and recall are also much lower than those in the sp_v6 log, where recall goes up to 0.6, I can only get it to ~0.37.

I looked into the yaml file in the sp_v6 tarfile and noticed that the loss weights seemed to be adapted for the 'unnormalized' descriptors, so I reverted the changes introduced in 95d1cfd. This helps with the distances (the values are ~0.03 and 0.02 for positive and negative):

But recall is still quite low (~0.38).

I also evaluated the model with normalization on HPatches and the results look reasonable. For comparison, I also loaded the sp_v6 checkpoint and ran the same evaluation:

Viewpoint changes			Illumination changes
Mine	sp_v6 ckpt	Claimed	Mine	sp_v6 ckpt	Claimed
0.613	0.645	0.674	0.630	0.655	0.662

	Illumination			Viewpoint			All
	Mine	sp_v6 ckpt	Claimed	Mine	sp_v6 ckpt	Claimed	Mine	sp_v6 ckpt	Claimed
e=1							0.460	0.477	0.483
e=3	0.940	0.936	0.965	0.650	0.654	0.712	0.793	0.793	0.836
e=5							0.894	0.881	0.91

The change in quality is not too bad, but I am still concerned that the numbers are always slightly below the ones reported in the README of this repository, so I would like to ask the following:

What version of the code was used to train the sp_v6 model? I am assuming it was trained on 2 GPUs using COCO data labeled with this checkpoint, but please correct me if I am wrong.
What model was used to compute the metric values claimed in the README?
Have you experienced the same behaviour with the positive and negative distances in the descriptor loss?

Thanks a lot in advance for your help, much appreciated!

rpautrat commented 1 year ago

Hi, I can try to help you replicate the original results, but up to a limit only. This work is indeed more than 4 years old, and I don't remember all the details of the experiments anymore. But regarding your three points:

The sp_v6 model was indeed trained on 2 GPUs on the COCO dataset, but with pseudo labels generated by this model.
I am not 100% sure, but I think to remember that the metrics of the Readme were computed with the sp_v6 model. What is quite likely is that the metrics themselves changed in the last 4 years, explaining the gap between the claimed values and the new ones you computed. If you also evaluate the original SuperPoint of Magic Leap, you might observe a similar difference.
I think the positive and negative distances are rather noisy and hard to interpret. I don't remember the trend during my trainings, but you can check the tensorboard log in the zip file of the pre-trained models to compare with your own values.

I hope this can be of some help to you.

martinarroyo commented 1 year ago

Hi, thanks for the quick reply. I mixed up the links in my message and meant indeed the one pointing to mp_synth-v11_ha1_trained, sorry for the confusion.

I compared the logs for the distances and the rest of the metrics. They look similar, however, I had not noticed before that the detector loss is actually much higher than for sp_v6 (green is my experiment, orange sp_v6). Could this mean that I need to tune $\lambda$? .

I'll try to also evaluate the Magic Leap SP implementation to see if the discrepancy is similar.

I think this more or less answers my questions. If you could comment on the discrepancy of the detector loss that would be great. I will close the issue once I run the Magic Leap model.

rpautrat commented 1 year ago

Indeed, you may want to tune $\lambda$ to better balance the descriptor and detector losses. The latter seems a bit too high in comparison.

ericzzj1989 commented 1 year ago

Since issue is somewhat related to my issue https://github.com/rpautrat/SuperPoint/issues/287#issue-1534493298, would you @martinarroyo please mind explaining and describing what changes you have made in the code for your results?

martinarroyo commented 1 year ago

Apologies for the belated response. My changes were minimal, I only made some modifications to the I/O logic so that it would work in my infrastructure as well as fixing some imports that were not working on my setup. The training logic was unaltered.

shreyasr-upenn commented 6 months ago

HI @rpautrat , I have been getting negative distances as zero in every step. Is this normal? What might have occurred? Same hyperparameter values as you have used.

rpautrat commented 5 months ago

Hi, having a zero negative loss is not impossible, but surprising. If you look at its definition here: https://github.com/rpautrat/SuperPoint/blob/361799f4b6f0252b9968470a70b529f2d8b27911/superpoint/models/utils.py#L123, it is obtained as a hinge loss max(0, desc_distance - m). So if the desc_distance is lower than the margin m (set to 0.2 by default), the negative loss becomes 0, which means that the model was perfectly able to distinguish different descriptors.

However, getting 0 at every steps seems a bit fishy and to good to be true. I would expect it to be positive for at least a few samples. Maybe you can try to plot a few values in the link above to understand what is happening. Checking the positive loss would also be interesting.

shreyasr-upenn commented 5 months ago

I have done the equivalent of this code block in Pytorch: https://github.com/rpautrat/SuperPoint/blob/361799f4b6f0252b9968470a70b529f2d8b27911/superpoint/models/utils.py#L140 ` positive_sum = torch.sum(valid_masklambda_ds*positive_dist) / valid_mask_norm

negative_sum = torch.sum(valid_mask*(1-s)*negative_dist) / valid_mask_norm`

rpautrat commented 5 months ago

I would suggest printing a few values to debug your code and understand why the negative loss becomes zero in your case. This sounds to good to be true.

rpautrat / SuperPoint

Trying to replicate sp_v6 and descriptor loss #288