About detector evaluation.

xlong0513 commented 4 years ago

Hi, sorry to disturb. I want to run detector_evaluation_magic_point.ipynb, and I got 2 questions.

Does the parameter data/add_augmentation_to_test_set needs to change in training? Or it is only change in test.
Why the parameter data/suffix is different for magicpoint and classical detectors. In magic-point_shapes.yaml, it is v6, and in classical-detectors_shapes.yaml, it is v5. I think they should be same because magicpoint and classical detectors should use the same data to compare?

rpautrat commented 4 years ago

Hi,

The parameter data/add_augmentation_to_test_set only add changes to the test set, so you can ignore it during training.
Yes, that's correct you should use the same data in the comparison. I just forgot to update the config classical-detectors_shapes.yaml after switching to v6. You will need to use your own suffix anyway when running the baselines.

xlong0513 commented 4 years ago

@rpautrat Thanks for the quick answer. I use the default v5 for classical detector, and run detector_evaluation_magic_point. However, I got errors. My code is here:

The error is here:

I tested and found that the shi_synth-v5-noise' and thefast_synth-v5-noise` both go wrong. Have you seen the errors before?

rpautrat commented 4 years ago

Hi, the issue comes from the fact that np.where(prob > t) is empty. This is because you didn't update the confidence_thresholds list: it should match with the experiments. Since you commented out two experiments, the right thresholds should be: confidence_thresholds = [0.1]*2 + [90000]*2 + [0.06]*2 + [40]*2

If you still encounter the error, consider reducing the confidence threshold until np.where(prob > t) is no more empty.

xlong0513 commented 4 years ago

Oh, thanks for the reply. It works now. Another question, I found the mAP with noise of magicpoint of my model is lower than yours, about 0.78. Oh, my model is trained with augmentation enabled.

rpautrat commented 4 years ago

You should then maybe increase the level of augmentation during training, like adding more aggressive noise. This would make the network more robust to noise afterwards.

xlong0513 commented 4 years ago

I see. Thanks for the answer.

xlong0513 commented 4 years ago

Hi, @rpautrat, I want to ask another question in training magicpoint and superpoint.

If in the Step 3 and Step 6, we use the same detections from Step 2, then they are actually same step. Right?
Suppose 1 is right, if I want to train magicpoint and superpoint respectively, I should export detections on MS-COCO with the Homographic Adaptation diable for magicpoint, and enable for superpoint. Am i right?

rpautrat commented 4 years ago

Hi,

If you use several rounds of step 2+3, then after a second round the export will be different. I suggest doing 2 rounds, so export (step 2), train MagicPoint (step 3), export (step 2 again), then train SuperPoint (step 6). In general, step 3 and 6 are different as the networks trained in both are different (with or without descriptors).
Homographic adaptation should always be used when exporting the detections, whether it is for MagicPoint or SuperPoint. It can only improve the quality of your pseudo label.

xlong0513 commented 4 years ago

Hi, I got it. You mean the difference is with or without descriptors. But why the Homographic adaptation should be used for MagicPoint, cause in the original paper, it is for SuperPoint.

And I have another question by the way. In deector_repeatability_coco.ipynb, I found the number of keypoints of MagicPoint is less than 300. And the numbers of the original image and the patch are not equal. As follows,

How can I get a stable detection number of keypoints as the classical methods.

rpautrat commented 4 years ago

Hi,

MagicPoint is the detector trained on synthetic data only (so without homography adaptation). Then we use homography adaptation to create the pseudo ground truth on COCO and train the detector+descriptor on it to get SuperPoint. The transition between the two (detector only trained on COCO images with homography adaptation) is also still called SuperPoint in the original paper I think.
The number of points detected by MagicPoint depends on the detection threshold that is used. If it is too high, less than 300 points will be detected. To ensure always using 300 points, you can lower the threshold to 0: confidence_thresholds = [0, 0]

xlong0513 commented 4 years ago

Thanks for the reply. I got it. Please open this issue for now, and I will test it.

xlong0513 commented 4 years ago

Hi, it works by lowering the threshold to 0. However, the repeatability result is weird, as follows, The repeatability of MagicPoint is very high, while the classical methods are too low. I have checked the plot_imgs results, and they are very similar. My code is here:

The difference should not be so huge. Did you ever have this problem? Thanks.

rpautrat commented 4 years ago

Instead of using a threshold of 0, you can maybe use a very small one. Because there could be situations with low texture images where it doesn't make sense to extract 300 points, and in that case using a threshold of 0 will still keep very bad keypoints. But even with that, I am surprised that you get such a huge difference...

xlong0513 commented 4 years ago

Yeah, I have tried a few values. However,the results never change, always the four values in the picture. Something is wrong maybe.

rpautrat commented 4 years ago

Can you show the images with keypoint detections of the classical methods that you get?

xlong0513 commented 4 years ago

Sure. From top to bottom, FAST, Harris, Shi-Tomas, magicpoint. fast-1 harris-1

shi-1 mp1

rpautrat commented 4 years ago

Thanks. The detections look fine for all of them, so I have no clue why you would suddenly have such low scores for the classical methods... And so when you keep the original detection thresholds, the repeatability is much better? If that's the case, I would just keep the 0 threshold for MagicPoint and the original thresholds for the others, as these thresholds were providing 300 points most of the time anyway.

xlong0513 commented 4 years ago

Thanks for the advice. I think the more important question thats why modifying the thresholds has no influence on repeatability results. I mean, no matter how I change the threshlods, the repeatability values remain the same. Did the computation use all the detection results? Besides, I found the number of coco repeatability result of MagicPoint is 900, while the number of classical methods is 600. Is this normal?

rpautrat commented 4 years ago

I think that if you really use a high threshold, you will see different repeatability values, due to the fact that you will evaluate with less than 300 points. But otherwise the repeatability values shouldn't change.

There is basically a double filtering: first all the keypoints with a score lower than the threshold are removed, then with keep the 300 best remaining keypoints. So it makes sense that you get the same numbers when varying the threshold at low thresholds, because you will still keep more than 300 points after the first filtering and the top 300 points will remain the same.

If by number of repeatability results you mean number of images on which it is evaluated, no it is not normal. You should use the same number for all experiments. This number is specified in the config file of your repeatability experiments ('eval_iter' parameter). That might explain the difference of performance you observed: the last 300 images might be much harder than the 600 first ones...

xlong0513 commented 4 years ago

ooh... Maybe I have done something wrong. I will try again. Great thanks.

rpautrat commented 3 years ago

I am closing this now, feel free to reopen it if you still have questions.

rpautrat / SuperPoint

About detector evaluation. #166