[Object Detection]RetinaNet Model mAP Problem

System information

What is the top-level directory of the model you are using:models/research/object_detection/
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):MS Windows 10 X64 1809(10.0.17763.316)
TensorFlow installed from (source or binary):source
TensorFlow version (use command below):1.12
Bazel version (if compiling from source):0.18
CUDA/cuDNN version:10.0/7.4
GPU model and memory:Nvidia Geforce RTX2080TI 11GB
Exact command to reproduce: python legacy/eval.py --logtostderr --checkpoint_dir=D:\tf_models\ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03 --eval_dir=D:\tf_project\ssd_mobilenet_v1_fpn\model\eval --pipeline_config_path=D:\tf_models\ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03\pipeline.config python legacy/eval.py --logtostderr --checkpoint_dir=D:\tf_models\ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03 --eval_dir=D:\tf_project\ssd_resnet50_v1_fpn\model\eval --pipeline_config_path=D:\tf_models\ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03\pipeline.config

Describe the problem

I just downloaded and unzipped the model from the model zoo.Then modified the pipeline.config file to set the location and number(=5000) of validation sets.The verification set is mscoco17_val.If I don't understand it wrong, this is equivalent to mscoco14_minval.Finally I use the legacy/eval.py tool to verify.But I got very strange results. For the SSD_MobilenetV1_FPN model, the mAP is 35, which is much higher than the 32 given in the model zoo, and also higher than the 29.7 given in the configuration file (I don't know which one is correct).For the SSD_Resnet50V1_FPN model, the mAP is 31, which is lower than the 35 given in the model zoo. I don't understand where the problem is.Maybe another issue(https://github.com/tensorflow/models/issues/6261) is related to this.

Source code / logs

SSD_MobilenetV1_FPN: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.350 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.515 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.390 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.095 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.329 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.500 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.284 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.415 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.426 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.125 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.420 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586 INFO:tensorflow:Writing metrics to tf summary. INFO:tensorflow:DetectionBoxes_Precision/mAP: 0.350050 INFO:tensorflow:DetectionBoxes_Precision/mAP (large): 0.499888 INFO:tensorflow:DetectionBoxes_Precision/mAP (medium): 0.328882 INFO:tensorflow:DetectionBoxes_Precision/mAP (small): 0.094604 INFO:tensorflow:DetectionBoxes_Precision/mAP@.50IOU: 0.514705 INFO:tensorflow:DetectionBoxes_Precision/mAP@.75IOU: 0.389985 INFO:tensorflow:DetectionBoxes_Recall/AR@1: 0.284125 INFO:tensorflow:DetectionBoxes_Recall/AR@10: 0.415416 INFO:tensorflow:DetectionBoxes_Recall/AR@100: 0.425720 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (large): 0.585944 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (medium): 0.420352 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (small): 0.124570 INFO:tensorflow:Losses/Loss/classification_loss: 0.271880 INFO:tensorflow:Losses/Loss/localization_loss: 0.156589 SSD_Resnet50V1_FPN: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.310 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.452 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.343 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.077 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.275 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.463 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.264 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.370 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.378 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.102 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.351 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.547 INFO:tensorflow:Writing metrics to tf summary. INFO:tensorflow:DetectionBoxes_Precision/mAP: 0.309752 INFO:tensorflow:DetectionBoxes_Precision/mAP (large): 0.463037 INFO:tensorflow:DetectionBoxes_Precision/mAP (medium): 0.274869 INFO:tensorflow:DetectionBoxes_Precision/mAP (small): 0.077343 INFO:tensorflow:DetectionBoxes_Precision/mAP@.50IOU: 0.451967 INFO:tensorflow:DetectionBoxes_Precision/mAP@.75IOU: 0.343374 INFO:tensorflow:DetectionBoxes_Recall/AR@1: 0.263974 INFO:tensorflow:DetectionBoxes_Recall/AR@10: 0.370283 INFO:tensorflow:DetectionBoxes_Recall/AR@100: 0.377995 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (large): 0.546755 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (medium): 0.350794 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (small): 0.102220 INFO:tensorflow:Losses/Loss/classification_loss: 0.301654 INFO:tensorflow:Losses/Loss/localization_loss: 0.175666

Update: I have downloaded the coco2014 dataset from cocodataset.org.Then find the Annotations file for coco2014 minval from https://github.com/rbgirshick/py-faster-rcnn/blob/master/data/README.md.Finally, tfrecord is generated by create_coco_tf_record. The commands used are as follows: python create_coco_tf_record.py --logtostderr --train_image_dir=D:\dataset\mscoco14\raw --val_image_dir=D:\dataset\mscoco14\raw --test_image_dir=D:\dataset\mscoco14\raw --train_annotations_file=D:\dataset\mscoco14\annotations\instances_valminusminival2014.json --val_annotations_file=D:\dataset\mscoco14\annotations\instances_minival2014.json --testdev_annotations_file=D:\dataset\mscoco14\annotations\image_info_test2014.json --output_dir=D:\dataset\mscoco14\tfrecord After getting the coco2014 minval tfrecord file, I re-run legacy/eval.py.The mAP of SSD_Resnet50V1_FPN has risen to 34.5. Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.345 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.484 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.380 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.078 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.338 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.514 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.281 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.390 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.395 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.092 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.387 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.568 However, the mAP of the SSD_MobilenetV1_FPN model has risen to 37.2, which is still not normal. Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.372 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.539 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.414 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.104 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.380 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.542 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.290 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.427 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.436 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.118 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.450 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.609

Our COCO 14 mini is different from COCO 17 val. See the image ids here.

To answer your question - since we use a different split, some of your val images are in our training set, which makes the mAP meaningless (typically leads to a higher mAP).

Our COCO 14 mini is different from COCO 17 val. See the image ids here.

To answer your question - since we use a different split, some of your val images are in our training set, which makes the mAP meaningless (typically leads to a higher mAP).

Thanks. Maybe we could note this dataset mismatch issue in https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

I will try to use image ids you mentioned to verify the numbers.

Our COCO 14 mini is different from COCO 17 val. See the image ids here. To answer your question - since we use a different split, some of your val images are in our training set, which makes the mAP meaningless (typically leads to a higher mAP). Thanks.

System information

What is the top-level directory of the model you are using:models/research/object_detection/

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):No

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):MS Windows 10 X64 1809(10.0.17763.316)

TensorFlow installed from (source or binary):source

TensorFlow version (use command below):1.12

Bazel version (if compiling from source):0.18

CUDA/cuDNN version:10.0/7.4

GPU model and memory:Nvidia Geforce RTX2080TI 11GB

Exact command to reproduce: python legacy/eval.py --logtostderr --checkpoint_dir=D:\tf_models\ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03 --eval_dir=D:\tf_project\ssd_mobilenet_v1_fpn\model\eval --pipeline_config_path=D:\tf_models\ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03\pipeline.config python legacy/eval.py --logtostderr --checkpoint_dir=D:\tf_models\ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03 --eval_dir=D:\tf_project\ssd_resnet50_v1_fpn\model\eval --pipeline_config_path=D:\tf_models\ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03\pipeline.config

Describe the problem

I just downloaded and unzipped the model from the model zoo.Then modified the pipeline.config file to set the location and number(=5000) of validation sets.The verification set is mscoco17_val.If I don't understand it wrong, this is equivalent to mscoco14_minval.Finally I use the legacy/eval.py tool to verify.But I got very strange results. For the SSD_MobilenetV1_FPN model, the mAP is 35, which is much higher than the 32 given in the model zoo, and also higher than the 29.7 given in the configuration file (I don't know which one is correct).For the SSD_Resnet50V1_FPN model, the mAP is 31, which is lower than the 35 given in the model zoo. I don't understand where the problem is.Maybe another issue(#6261) is related to this.

Source code / logs

SSD_MobilenetV1_FPN: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.350 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.515 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.390 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.095 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.329 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.500 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.284 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.415 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.426 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.125 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.420 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586 INFO:tensorflow:Writing metrics to tf summary. INFO:tensorflow:DetectionBoxes_Precision/mAP: 0.350050 INFO:tensorflow:DetectionBoxes_Precision/mAP (large): 0.499888 INFO:tensorflow:DetectionBoxes_Precision/mAP (medium): 0.328882 INFO:tensorflow:DetectionBoxes_Precision/mAP (small): 0.094604 INFO:tensorflow:DetectionBoxes_Precision/mAP@.50IOU: 0.514705 INFO:tensorflow:DetectionBoxes_Precision/mAP@.75IOU: 0.389985 INFO:tensorflow:DetectionBoxes_Recall/AR@1: 0.284125 INFO:tensorflow:DetectionBoxes_Recall/AR@10: 0.415416 INFO:tensorflow:DetectionBoxes_Recall/AR@100: 0.425720 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (large): 0.585944 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (medium): 0.420352 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (small): 0.124570 INFO:tensorflow:Losses/Loss/classification_loss: 0.271880 INFO:tensorflow:Losses/Loss/localization_loss: 0.156589 SSD_Resnet50V1_FPN: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.310 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.452 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.343 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.077 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.275 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.463 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.264 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.370 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.378 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.102 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.351 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.547 INFO:tensorflow:Writing metrics to tf summary. INFO:tensorflow:DetectionBoxes_Precision/mAP: 0.309752 INFO:tensorflow:DetectionBoxes_Precision/mAP (large): 0.463037 INFO:tensorflow:DetectionBoxes_Precision/mAP (medium): 0.274869 INFO:tensorflow:DetectionBoxes_Precision/mAP (small): 0.077343 INFO:tensorflow:DetectionBoxes_Precision/mAP@.50IOU: 0.451967 INFO:tensorflow:DetectionBoxes_Precision/mAP@.75IOU: 0.343374 INFO:tensorflow:DetectionBoxes_Recall/AR@1: 0.263974 INFO:tensorflow:DetectionBoxes_Recall/AR@10: 0.370283 INFO:tensorflow:DetectionBoxes_Recall/AR@100: 0.377995 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (large): 0.546755 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (medium): 0.350794 INFO:tensorflow:DetectionBoxes_Recall/AR@100 (small): 0.102220 INFO:tensorflow:Losses/Loss/classification_loss: 0.301654 INFO:tensorflow:Losses/Loss/localization_loss: 0.175666

When I was using legacy/eval.py to evaluate, I got:

Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "models/research/object_detection/legacy/eval.py", line 142, in <module>
    tf.app.run()
  File "lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 306, in new_func
    return func(*args, **kwargs)
  File "models/research/object_detection/legacy/eval.py", line 138, in main
    graph_hook_fn=graph_rewriter_fn)
  File "models/research/object_detection/legacy/evaluator.py", line 194, in evaluate
    model = create_model_fn()
  File "models/research/object_detection/builders/model_builder.py", line 119, in build
    return _build_ssd_model(model_config.ssd, is_training, add_summaries)
  File "models/research/object_detection/builders/model_builder.py", line 236, in _build_ssd_model
    is_training=is_training)
  File "models/research/object_detection/builders/model_builder.py", line 212, in _build_ssd_feature_extractor
    return feature_extractor_class(**kwargs)
  File "models/research/object_detection/models/ssd_inception_v2_feature_extractor.py", line 75, in __init__
    raise ValueError('SSD Inception V2 feature extractor always uses'
ValueError: SSD Inception V2 feature extractor always usesscope returned by `conv_hyperparams_fn` for both the base feature extractor and the additional layers added since there is no arg_scope defined for the base feature extractor.

It looks like the script tries to find some override_base_feature_extractor_hyperparams in the config to overwrite default setting. But it fails.

I'm using python 3.5.2, my command is like: python -m object_detection.legacy.eval --logtostderr --checkpoint_dir=ssd_inception_v2/ssd_inception_v2_coco_2018_01_28 --eval_dir=./eval_result --pipeline_config_path=ssd_inception_v2/ssd_inception_v2_coco_2018_01_28/pipeline.config

Do you know what might be the problem? Thanks.

Our COCO 14 mini is different from COCO 17 val. See the image ids here.

To answer your question - since we use a different split, some of your val images are in our training set, which makes the mAP meaningless (typically leads to a higher mAP).

Is the "some of your val images are in our training set" part actually the case? According to slide 5 of 24 in the COCO 2017 detection presentation, they seem to indicate that all of Train2014 is in Train2017, and hence not in Val2017.

Screen Shot 2019-03-18 at 4 17 07 PM

Our minival2014 = 8K images (id here)

Our training dataset is Train2014 + Val2014 - our_minival2014

Ah, that makes total sense. As a result, you'd have some of the val2017 images in the training set.

tensorflow / models