Improve training results

git-ry commented 5 years ago

Hi @rpautrat, thanks for the great implementation!

I had a few questions regarding training and how to improve it. FWIW, my results are already pretty good.

My training procedure follows your recommendations on the README and in #74. My results are the following: Repeatability - COCO Repeatability - HPatches Viewpoint Repeatability - HPatches Illumination Descriptor Evaluation superpoint_hpatches-v: 0.22033898305084745 superpoint_hpatches-i: 0.9087719298245615

Questions:

When evaluating on COCO, the average number of points for superpoint is smaller than magicpoint. However, when evaluating on HPatches, the average number of points for superpoint is higher than magicpoint. Any thoughts on why?
Any ideas on how to improve the HPatches viewpoint Repeatability score? When exporting, decreasing the detection_threshold (currently set to 0.015) increases the number of detections, but that might produce multiple detections within a single 8x8 patch. I understand that 0.015 is chosen because it's similar to 1/65 (the response is spread into 65 bins and detection threshold should be 1/65 or less). Will reducing detection_threshold have a detrimental effect on training?
Any recommendations when finetuning on my own dataset? I plan to use train Magicpoint a third time using my dataset prior to training Superpoint.

rpautrat commented 5 years ago

Hi!

From what I see in your screenshots, the average number of points is always smaller for SuperPoint than for MagicPoint, whether it is COCO or HPatches. I don't have an explanation for this though. But it means that one should probably decrease the detection_threshold when exporting the pseudo ground truth, at least for the training of SuperPoint. Note that if you want to compare MagicPoint and SuperPoint repeatabilities, you need to make sure that they have roughly the same number of points on average.
I don't think that decreasing the detection_threshold will have a negative impact on training. Even if you get multiple ground truths per 8x8 patch, the current state of the code actually selects only one of these points as ground truth. Currently it selects the point with an argmax, so it is not clear how it is chosen, but I will soon push a new version where the point is chosen randomly among all possibilities. To improve the repeatability on viewpoint changes, I also tried to increase the amount variations in the warped pair during training. But it seems that after a certain threshold the results get worse, probably because we are training it on too difficult viewpoint changes.
Yes, you should definitely train MagicPoint at least once on your own dataset before training SuperPoint. You can try to do a second run of homography adaptation and if the repeatability improves on your dataset then it means that it is worth having several rounds of training on your dataset. When you will evaluate SuperPoint on your own dataset, don't forget to adapt the NMS parameter to the size of your images (currently 4 for 240x320 images).

git-ry commented 5 years ago

Ok, I ran a few more tests and the results are mixed unfortunately.

Here's the results for various detection_threshold values:

	0.015	0.01	0.001
COCO Repeat	0.713	0.717	0.576
Hpatches Repeat-i	0.673	0.685	0.704
HPatches Repeat-v	0.336	0.352	0.339
Hpatches Descriptor-v	0.220	0.132	0.047

Decreasing the detection_threshold improves the repeatability overall (as expected) since the average number of keypoints for both images is ~300. The descriptor score though becomes worse very quickly. The only explanation I can think of is that the descriptor pairs used to train were too similar and at test time, the network doesn't find an obvious point within an 8x8 patch to select as a feature. In other words, the response within each 8x8 patch is blurred or flat - no obvious feature stands out (although one is selected since it goes through an argmax).

Any thoughts on how to improve the descriptor viewpoint score? There seems to be a tradeoff between repeatability and descriptor scores. Note that I have not tried increasing the viewpoint variation as you suggested.

yanhuanhuanyy commented 5 years ago

Hi, @git-ry, I have a little question for you. How do you evaluate the repeatability on COCO(include superpoint-coco-repeatability and magic-point-coco-repeatability)? could you give me some help and advice? Thanks very much.

rpautrat commented 5 years ago

Interesting! Indeed, I think that the explanation for the decreasing descriptor score is due to the fact that with a lower detection_threshold, more keypoints are detected and we can get several detections in the same 8x8 patch. The NMS can then keep one of these keypoints in image 1, while keeping another keypoint in the same patch for image 2 and they cannot be matched together.

Another explanation could be that the descriptors of close keypoints are too similar and the matcher cannot really distinguish them. This would be due to the interpolation of the descriptor to go from the coarse descriptor image (dimension H/8 x W/8) to the full image descriptor (H x W). It is currently a bicubic interpolation, but other kind of interpolations might improve the descriptor score.

You can also have a look at this article: https://arxiv.org/abs/1907.04011. They try to improve SuperPoint by incorporating the NMS within the network (the network can predict only one point per 8x8 patch). This might also be an option to solve the problem.

rpautrat commented 5 years ago

@yanhuanhuanyy, you just have to modify configs/magic-point_repeatability.yaml and replace the dataset entry by 'coco', use False for 'resize' and change the model name to switch between MagicPoint and SuperPoint ('magic_point' or 'super_point').

You can then use export_detections_repeatability.py as in step 4 to compute the repeatability.

yanhuanhuanyy commented 5 years ago

Hi, @rpautrat, many thanks for your reply. I tried, it works. But I still have some question. Q1: data/alteration in magic-point_repeatability.yaml is 'all' ? ( evaluate the repeatability on COCO) Q2: I follow your steps, then I have three the experiment names (magic-point_synth, magic-point_coco, superpoint_coco). 'magic-point_synth' is 'magic_point', 'magic-point_coco' is superpoint detector model, 'superpoint_coco' is 'super_point'. Is that right? Q3: when I evaluate the repeatability on COCO or HPatches about 'magic_point' model or 'super_point' model, I usually use the command: python export_detections_repeatability.py configs/magic-point_repeatability.yaml magic-point_coco --export_name=magic-point_hpatches-repeatability-v In addition to changing the model name('magic_point' or 'super_point') in the configuration file, does the experiment name need to be changed? for example, about 'magic_point' model, the experiment name is 'magic-point_synth'. About 'super_point' model, is the experiment name 'magic-point_coco' or 'superpoint_coco' ? when comparing classical detectors(fast, harris, shi), what is the experiment name?

rpautrat commented 5 years ago

Q1: Yes, use 'all' to evaluate the repeatability on COCO.

Q2: Yes, that's correct. But note that in 'magic-point_coco' the detector has been trained alone (without the descriptors part).

Q3: Yes you need to change the name of the experiment in the command line as well. For 'super_point' model, use 'superpoint_coco' model. For classical baselines (fast, harris, shi), you can put anything for the experiment name, it won't be used anyway.

yanhuanhuanyy commented 5 years ago

OK, @rpautrat. Thanks very much.

rpautrat / SuperPoint

Improve training results #99