rpautrat / SuperPoint

Efficient neural feature detector and descriptor
MIT License
1.92k stars 423 forks source link

Training with very low precision and recall #15

Closed xz1cv closed 4 years ago

xz1cv commented 6 years ago

Hello, I run the training code as following, python experiment.py train configs/magic-point_shapes.yaml magic-point_synth

the network starts to train and the loss is decreasing but the precision and recall are not changed.

2018-09-03 16:02:46.232410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0 2018-09-03 16:02:46.232513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 292 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) [09/03/2018 16:02:48 INFO] Start training [09/03/2018 16:03:00 INFO] Iter 0: loss 4.7797, precision 0.0006, recall 0.0615 [09/03/2018 16:05:05 INFO] Iter 1000: loss 1.5162, precision 0.0006, recall 0.0579 [09/03/2018 16:07:25 INFO] Iter 2000: loss 0.5852, precision 0.0005, recall 0.0541 [09/03/2018 16:09:44 INFO] Iter 3000: loss 0.2996, precision 0.0005, recall 0.0531 [09/03/2018 16:12:01 INFO] Iter 4000: loss 0.2295, precision 0.0006, recall 0.0445 [09/03/2018 16:14:14 INFO] Iter 5000: loss 0.2241, precision 0.0013, recall 0.0180 [09/03/2018 16:16:26 INFO] Iter 6000: loss 0.2034, precision 0.0016, recall 0.0063

How can I get the correct results? Thank you!

rpautrat commented 6 years ago

Hi,

Have you kept the current parameters of config/magic-point_shapes.yaml as they are now on master or have you modified anything?

I just did a small training and I get the following values: [09/05/2018 10:15:45 INFO] Start training [09/05/2018 10:16:00 INFO] Iter 0: loss 4.7933, precision 0.0006, recall 0.0577 [09/05/2018 10:18:58 INFO] Iter 1000: loss 1.3800, precision 0.0025, recall 0.2453 [09/05/2018 10:21:15 INFO] Iter 2000: loss 0.5055, precision 0.0032, recall 0.3090 [09/05/2018 10:23:32 INFO] Iter 3000: loss 0.2442, precision 0.0035, recall 0.3290 [09/05/2018 10:25:44 INFO] Iter 4000: loss 0.1531, precision 0.0112, recall 0.3983 [09/05/2018 10:28:00 INFO] Iter 5000: loss 0.1201, precision 0.1229, recall 0.4290 [09/05/2018 10:30:17 INFO] Iter 6000: loss 0.1003, precision 0.1437, recall 0.4055 [09/05/2018 10:32:35 INFO] Iter 7000: loss 0.0996, precision 0.1901, recall 0.4022 [09/05/2018 10:34:52 INFO] Iter 8000: loss 0.0943, precision 0.2205, recall 0.4388

xz1cv commented 6 years ago

No, I haven't change the config file.

rpautrat commented 6 years ago

Hum this is strange then.

Can you run the following notebooks and see if you get results very different from those that are visible on Git?

I don't see why you should have results different from those of Git, but it's worth checking it.

xz1cv commented 6 years ago

Sorry, I can't find the plot_imgs function? I check the visualize_random_homography.ipynb and the results are correct.

Another question. Has the network been trained? Because the loss is decreasing, the training data is ok. Maybe the problem is from the evaluation process?

rpautrat commented 6 years ago

plot_imgs is located in notebooks/utils.py.

Indeed, given your observations, the problem could lie in the evaluation process. But this one is very simple, so I don't see what could be wrong (you can check it at the end of the file superpoint/models/magic_point.py). And I don't see why you experience this problem whereas I don't.

What you can do is to do a full training ignoring the values of recall and precision and evaluate your trained model on repeatability to see if your detector actually works or not.

ucmmesa commented 6 years ago

This is part from my log file: 2018-09-06 23:38:18.903401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7295 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. [09/06/2018 23:38:19 INFO] Scale of 0 disables regularizer. 2018-09-06 23:38:20.009177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0 2018-09-06 23:38:20.009337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5681 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) [09/06/2018 23:38:22 INFO] Start training [09/06/2018 23:38:36 INFO] Iter 0: loss 4.8073, precision 0.0005, recall 0.0471 [09/06/2018 23:41:40 INFO] Iter 1000: loss 1.4008, precision 0.0005, recall 0.0519 [09/06/2018 23:45:05 INFO] Iter 2000: loss 0.5078, precision 0.0005, recall 0.0522 [09/06/2018 23:48:27 INFO] Iter 3000: loss 0.3057, precision 0.0005, recall 0.0505 [09/06/2018 23:51:48 INFO] Iter 4000: loss 0.2066, precision 0.0005, recall 0.0392 [09/06/2018 23:55:08 INFO] Iter 5000: loss 0.2029, precision 0.0009, recall 0.0220 [09/06/2018 23:58:34 INFO] Iter 6000: loss 0.3004, precision 0.0006, recall 0.0596 [09/07/2018 00:01:52 INFO] Iter 7000: loss 0.2269, precision 0.0002, recall 0.0034 [09/07/2018 00:02:02 INFO] Got Keyboard Interrupt, saving model and closing.

I wonder why the prompt "Scale of 0 disables regularizer" occurred. Does the train care the GPU type or python version? I use one gtx1070 and python3.6.3. Any-more,when I run notebooks/visualize_synthetic-shapes_augmentation.ipynb,I get: image Is this right?

rpautrat commented 6 years ago

The "Scale of 0 disables regularizer" comes from the l2 regularizer (tf.contrib.layers.l2_regularizer) which set to 0 by default (no regularization). You can choose to add a bit of regularization by changing the parameter 'kernel_reg' in the config file, but it should already work fine without regularization.

I don't think that your GPU and Python version are a problem, they should be fine.

Your visualization seems ok, except that some homographies are a bit too strong (like the 3rd in the first line). You can reduce the effect of homographic warps by decreasing the parameters 'perspective_amplitude_x' and 'perspective_amplitude_y' in the config file (0.15 for both for example). But I don't think that it will change your result so much.

Overall, I you shouldn't pay too much attention to the recall and precision values. They are just a quick way to check if you are very close to the labels, but the real metrics you should pay attention to are the repeatability, average precision and localization error (the current framework doesn't allow to compute these metrics during training for now, but there are notebooks to compute them after your model has been trained, like notebooks/detector_evaluation_magic_point.ipynb, notebooks/detector_repeatability_coco.ipynb and notebooks/detector_repeatability_hpatches.ipynb ).

The main problem with precision and recall are that they are very dependent on the threshold you choose to distinguish the real interest points from the not so interesting points (the parameters 'detection_threshold'). And they only count exact predictions (i.e. a point predicted that miss a ground truth by one pixel is still considered as wrong), so it is really a very rough way to evaluate your features detector.

My advice is to train it (ignoring the precision and recall) until the loss has stopped decreasing (15000 - 20000 iterations should be enough) and then try to evaluate the repeatability of the model with a notebook. You should see an already good repeatability score, even though the precision and recall are very low.

ucmmesa commented 6 years ago

Thanks for the detailed explanation.However,the result is not so satisfied.I kept the current parameters of config/magic-point_shapes.yaml as they are now on master,and do the train: (1) python3 experiment.py train configs/magic-point_shapes.yaml magic-point_synth_no_aug to get the model.Here is part of the log file: [09/07/2018 20:44:23 INFO] Start training [09/07/2018 20:44:36 INFO] Iter 0: loss 4.8103, precision 0.0006, recall 0.0529 [09/07/2018 20:47:41 INFO] Iter 1000: loss 1.2680, precision 0.0006, recall 0.0579 [09/07/2018 20:51:07 INFO] Iter 2000: loss 1.2574, precision 0.0006, recall 0.0587 [09/07/2018 20:54:32 INFO] Iter 3000: loss 0.5118, precision 0.0006, recall 0.0583 [09/07/2018 20:57:52 INFO] Iter 4000: loss 0.2922, precision 0.0006, recall 0.0535 [09/07/2018 21:01:14 INFO] Iter 5000: loss 0.2365, precision 0.0006, recall 0.0554 [09/07/2018 21:04:32 INFO] Iter 6000: loss 0.2189, precision 0.0004, recall 0.0081 /home/dl-xhu/superpoint_tf/SuperPoint/superpoint/models/base_model.py:389: RuntimeWarning: Mean of empty slice metrics = {m: np.nanmean(metrics[m], axis=0) for m in metrics} [09/07/2018 21:07:50 INFO] Iter 7000: loss 0.2311, precision nan, recall 0.0000 [09/07/2018 21:11:09 INFO] Iter 8000: loss 0.2141, precision nan, recall 0.0000 [09/07/2018 21:14:27 INFO] Iter 9000: loss 0.2271, precision nan, recall 0.0000 [09/07/2018 21:17:44 INFO] Iter 10000: loss 0.2022, precision nan, recall 0.0000 [09/07/2018 21:21:01 INFO] Iter 11000: loss 0.1988, precision nan, recall 0.0000 [09/07/2018 21:24:19 INFO] Iter 12000: loss 0.2251, precision nan, recall 0.0000 [09/07/2018 21:27:37 INFO] Iter 13000: loss 0.2316, precision nan, recall 0.0000 [09/07/2018 21:30:56 INFO] Iter 14000: loss 0.2280, precision nan, recall 0.0000 [09/07/2018 21:34:13 INFO] Iter 15000: loss 0.2503, precision nan, recall 0.0000 [09/07/2018 21:37:30 INFO] Iter 16000: loss 0.2030, precision nan, recall 0.0000 [09/07/2018 21:37:46 INFO] Got Keyboard Interrupt, saving model and closing. I train the net twice:one is with add_augmentation_to_test_set as false and another one as true; and then detect : (2) python3 export_detections.py configs/magic-point_shapes.yaml mp_synth-v6_photo-aug_synth (3) run notebooks/detector_evaluation_magic_point.ipynb the result: image

image image

Obviously ,the detection result is not reasonable and the trained model is not acceptable.

rpautrat commented 6 years ago

Actually, you can train the network only once (without the parameter 'add_augmentation_to_test_set') and it is only when you do the export_detections.py that you can distinguish add_augmentation_to_test_set = False or True. But it won't change anything to your current results anyway.

When I look at your predictions, I see a major problem: even your ground truth is wrong (left column of the predictions)!! So I think the problem lies somewhere in the generation of the synthetic dataset (and consequently the training has no sense if the labels are wrong).

Can you run the notebooks visualize_synthetic-shapes.ipynb and visualize_synthetic-shapes_augmentation.ipynb to check if the labels are correct there or not?

ucmmesa commented 6 years ago

The result from running visualize_synthetic-shapes.ipynb: image image

and from visualize_synthetic-shapes_augmentation.ipynb: image image Another question,where are the images after line 61 from and how to get them? image

rpautrat commented 6 years ago

Everything seems correct in your outputs. The images after line 61 can be obtained by running again the notebook, but with the parameter augmentation->homographic->enable->False in the config file.

So your generator of synthetic shapes seems ok, but in the export of the predictions (and maybe during the training) the ground truth becomes wrong somehow. I really don't see why you experience such a thing...

What version of Tensorflow are you using? It should be at least 1.6.

ucmmesa commented 6 years ago

My Tensorflow is 1.6,numpy is 1.15 ,opencv is 3.4.2 and python is 3.6.6.

rpautrat commented 6 years ago

It should be enough. I really don't see why your ground truth is wrong then...

One last try: what do you get if you export the predictions with augmentation->homographic->enable->False in the config file and then run the notebook detector_evaluation_magic-point.ipynb? Is the ground truth still wrong?

ucmmesa commented 6 years ago

This is the config file for train and detection: data: add_augmentation_to_test_set: false augmentation: homographic: enable: false params: allow_artifacts: true max_angle: 1.57 patch_ratio: 0.8 perspective: true perspective_amplitude_x: 0.2 perspective_amplitude_y: 0.2 rotation: true scaling: true scaling_amplitude: 0.2 translation: true translation_overflow: 0.05 valid_border_margin: 2 photometric: enable: true params: additive_gaussian_noise: stddev_range:

When running the notebook detector_evaluation_magic-point.ipynb,error occurred: image

And the ground truth from the detection is still wrong: image

xz1cv commented 6 years ago

@rpautrat Hi, I think I have found the problem. in synthetic_shapes.py, line 138 for s in splits: for obj in ['images', 'points']: e = [str(p) for p in Path(path, obj, s).iterdir()] splits[s][obj].extend(e[:int(truncatelen(e))]) the images and points are processed seperately. However, I find iterdir() gets different orders in two folders. The images and points are not matched, which makes the groundtruth is wrong. I simply change the code, for s in splits: e = [str(p) for p in Path(path, 'images', s).iterdir()] f = [p.replace('images','points') for p in e] g = [p.replace('.png','.npy') for p in f] splits[s]['images'].extend(e[:int(truncatelen(e))]) splits[s]['points'].extend(g[:int(truncate*len(g))]) then, the percision and recall can change correctly. [10/16/2018 14:28:11 INFO] Start training [10/16/2018 14:28:25 INFO] Iter 0: loss 4.7800, precision 0.0006, recall 0.0503 [10/16/2018 14:30:30 INFO] Iter 1000: loss 1.4516, precision 0.0021, recall 0.2050 [10/16/2018 14:32:50 INFO] Iter 2000: loss 0.4867, precision 0.0036, recall 0.3481 [10/16/2018 14:35:09 INFO] Iter 3000: loss 0.2476, precision 0.0034, recall 0.3351 [10/16/2018 14:37:25 INFO] Iter 4000: loss 0.1485, precision 0.0226, recall 0.3821 [10/16/2018 14:39:41 INFO] Iter 5000: loss 0.1191, precision 0.1412, recall 0.4161 [10/16/2018 14:41:58 INFO] Iter 6000: loss 0.1204, precision 0.1483, recall 0.3651 [10/16/2018 14:44:14 INFO] Iter 7000: loss 0.0935, precision 0.2103, recall 0.4281 [10/16/2018 14:46:30 INFO] Iter 8000: loss 0.0879, precision 0.1935, recall 0.4343 [10/16/2018 14:48:46 INFO] Iter 9000: loss 0.1060, precision 0.1772, recall 0.4206 [10/16/2018 14:51:00 INFO] Iter 10000: loss 0.0729, precision 0.2552, recall 0.4678 [10/16/2018 14:53:15 INFO] Iter 11000: loss 0.0886, precision 0.2510, recall 0.4640 [10/16/2018 14:55:30 INFO] Iter 12000: loss 0.0896, precision 0.3162, recall 0.4627 [10/16/2018 14:57:46 INFO] Iter 13000: loss 0.0994, precision 0.2631, recall 0.4484 [10/16/2018 15:00:02 INFO] Iter 14000: loss 0.0960, precision 0.2435, recall 0.4348 [10/16/2018 15:02:18 INFO] Iter 15000: loss 0.0938, precision 0.2225, recall 0.4879 [10/16/2018 15:04:35 INFO] Iter 16000: loss 0.0927, precision 0.3063, recall 0.5120 [10/16/2018 15:06:50 INFO] Iter 17000: loss 0.0872, precision 0.2550, recall 0.4783 [10/16/2018 15:09:05 INFO] Iter 18000: loss 0.0636, precision 0.3119, recall 0.4804 [10/16/2018 15:09:30 INFO] Got Keyboard Interrupt, saving model and closing. [10/16/2018 15:09:31 INFO] Saving checkpoint for iteration #18185

I run export_detections.py on COCO dataset, get results as follows, 123

Most keypoints can be detected correctly, but it's strange that some keypoints are on the edge of images. Any idea of this situation?

rpautrat commented 6 years ago

Hi,

Indeed, good point, the wrong ground truth must have come from the different ordering in iterdir(). It is strange though that on my computer I get the same ordering and not on yours. Maybe it comes from a different Python version?

Anyway, it is good that you finally get good results, even though the detections that you get on the edges are indeed weird. But if you look at them carefully, most of them would be correct if you consider the edge of the image as a real line of the image (for example the 4 corners are detected, or the intersections between the shadow of the people and the edge of the image).

This is as if the network had learned to recognize these "fake intersection" as real feature points. The only way for the network to learn this would be that you have the same phenomenon in your ground truth (which shouldn't be the case).

Can you print again the synthetic images and the ground truth that you generated? If you see for example a ground truth point when a line of the checkerboard crosses the edge of the image, then the ground truth is not correct (it shouldn't be considered as a feature point) and it would explain what you observe.

xz1cv commented 6 years ago

This is the result of visualize_synthetic-shapes_augmentation screenshot from 2018-10-19 12-32-58 and another results of COCO: 123 123 It is good that almost all keypoints are found. Maybe the detection_threshold and top_k in config file lead strange detections on the edges. I set detection_threshold=0.015 and top_k=100, but it is for sure that the first image does not have 100 keypoints.

rpautrat commented 6 years ago

Ok your ground truth looks good, so my first guess was not the right explanation.

It could be linked to the fact that you allow artifacts on the border of the image. When any homography is applied to the image (through data augmentation or homographic adaptation), you can choose to allow bordering artifacts or not. Allowing them offers more diversity in the homographies and this is what is used in the current config. But in the code, a mask is applied in the network to ignore the detections outside of the borders of the image, so these artifacts shouldn't be a problem. You can still try to train without artifact ('allow_artifacts': false in the config) and if you use homographic adaptation to avoid using it too. But it is only a guess and might not be the cause of the problem.

You can also indeed keep less keypoints by increasing detection_threshold and decreasing top_k (with a bit of luck the weakest predictions are those on the edges). But from my experience, even with a low threshold (detection_threshold=0.001) and high top_k (600), I never observed such weird detections on the edges...

zpfriedel commented 5 years ago

So looking quickly at the code for homographic_adaptation and homography_adaptation, it looks like when we allow artifacts during training the mask is being created in homographic_adaptation and applied in the loss, but when we export detections and allow artifacts I dont see a mask being applied anywhere. Could this explain the detections on the border? When i set allow artifacts to False during exporting i dont get the detections on the border anymore and actually get better detections within the image.

Please let me know if I'm assuming this incorrectly.

rpautrat commented 5 years ago

Yes, it's absolutely correct!

There is indeed no mask when doing the homography adaptation, but the 'count' tensor makes sure that only the detections inside the image (and not those in the artifacts) are actually used. However if artifacts are allowed, some detections can happen on the boundary between the image and the black artifacts, they will be aggregated on the original image and we will get these detections on the borders.

I will fix it by incorporating a safety margin in the 'count' (as it is for the valid mask currently) in the homography adaptation to ignore detections too close to the border. Thanks a lot for your observation!

xlong0513 commented 4 years ago

@ucmmesa Hi, how did u visulize the detection results like these? 1111 Or how can i get my detection results on Synthetic Shapes after step1? Thanks !!