mlcommons / inference_results_v0.5

This repository contains the results and code for the MLPerf™ Inference v0.5 benchmark.
https://mlcommons.org/en/inference-datacenter-05/
Apache License 2.0
55 stars 43 forks source link

Intel: SSD-MobileNet accuracy mAP=15.280% #27

Open psyhtest opened 4 years ago

psyhtest commented 4 years ago

We've meticulously reconstructed all components of Intel's MLPerf Inference v0.5 submission, including:

Unfortunately, the reached accuracy (15.280%) is much lower than expected (22.627%):

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.153
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.236
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.167
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.012
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.101
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.361
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.150
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.178
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.178
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.014
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.114
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.423
mAP=15.280%

To reproduce, follow the instructions for the prebuilt Docker image.

You can also reproduce the same in a native Debian environment by following Collective Knowledge steps in the source Dockerfile.

psyhtest commented 4 years ago

If I remove the --reverse_input_channels flag from the model conversion, I get a much higher accuracy (20.798%):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.208
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.320
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.226
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.016
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.145
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.472
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.192
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.237
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.238
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.021
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.167
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.536
mAP=20.798%
psyhtest commented 4 years ago

Using the other quantized model without --reverse_input_channels gives worse accuracy (20.150%):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.201
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.312
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.219
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.016
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.138
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.463                            
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.187
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.231                                                
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.232                                   
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.021
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.159
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.530
mAP=20.150% 
psyhtest commented 4 years ago

Using the other quantized model with --reverse_input_channels gives even worse accuracy (14.827%):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.148
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.228
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.161
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.010
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.095
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.343
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.146
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.174
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.174
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.012
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.109
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.407
mAP=14.827%
psyhtest commented 4 years ago
Reverse input channels? Quantized model Accuracy (mAP)
yes asymmetric 14.827%
symmetric 15.280%
no asymmetric 20.150%
symmetric 20.798%
psyhtest commented 4 years ago

Here's the full conversion log:

/usr/bin/python3.6 /home/anton/CK_TOOLS/lib-openvino-gcc-8.4.0-2019_R3.1-linux-64/dldt/model-optimizer/mo.py --model_name converted_model --input_model /home/anton/CK_TOOLS/model-tf-mlperf-ssd-mobilenet-q
uantized-finetuned-for.openvino/ssd_mobilenet_v1_quant_ft_no_zero_point_frozen_inference_graph.pb --input_shape [1,300,300,3] --tensorflow_object_detection_api_pipeline_config /home/anton/CK_TOOLS/model-t
f-mlperf-ssd-mobilenet-quantized-finetuned-for.openvino/pipeline.config --tensorflow_use_custom_operations_config /home/anton/CK_TOOLS/lib-openvino-gcc-8.4.0-2019_R3.1-linux-64/dldt/model-optimizer/extens
ions/front/tf/ssd_v2_support.json
Model Optimizer arguments:                     
Common parameters:
        - Path to the Input Model:      /home/anton/CK_TOOLS/model-tf-mlperf-ssd-mobilenet-quantized-finetuned-for.openvino/ssd_mobilenet_v1_quant_ft_no_zero_point_frozen_inference_graph.pb
        - Path for generated IR:        /home/anton/CK_TOOLS/model-openvino-converted-from-tf-ssd-mobilenet/.
        - IR output name:       converted_model            
        - Log level:    ERROR
        - Batch:        Not specified, inherited from the model     
        - Input layers:         Not specified, inherited from the model
        - Output layers:        Not specified, inherited from the model
        - Input shapes:         [1,300,300,3]
        - Mean values:  Not specified                              
        - Scale values:         Not specified
        - Scale factor:         Not specified                    
        - Precision of IR:      FP32
        - Enable fusing:        True                             
        - Enable grouped convolutions fusing:   True
        - Move mean values to preprocess section:       False                
        - Reverse input channels:       False
TensorFlow specific parameters:                                    
        - Input model in text protobuf format:  False
        - Path to model dump for TensorBoard:   None                 
        - List of shared libraries with TensorFlow custom layers implementation:        None
        - Update the configuration file with input/output node names:   None
        - Use configuration file used to generate the model with Object Detection API:  /home/anton/CK_TOOLS/model-tf-mlperf-ssd-mobilenet-quantized-finetuned-for.openvino/pipeline.config
        - Operations to offload:        None
        - Patterns to offload:  None       
        - Use the config file:  /home/anton/CK_TOOLS/lib-openvino-gcc-8.4.0-2019_R3.1-linux-64/dldt/model-optimizer/extensions/front/tf/ssd_v2_support.json
Model Optimizer version:        unknown version                                                                      
The Preprocessor block has been removed. Only nodes performing mean value subtraction and scaling (if applicable) are kept.

[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /home/anton/CK_TOOLS/model-openvino-converted-from-tf-ssd-mobilenet/./converted_model.xml
[ SUCCESS ] BIN file: /home/anton/CK_TOOLS/model-openvino-converted-from-tf-ssd-mobilenet/./converted_model.bin
[ SUCCESS ] Total execution time: 38.25 seconds.
psyhtest commented 4 years ago

After my blunder with #29 where I didn't think of running the ImageNet accuracy script with another data type parameter, I decided to check the COCO accuracy script on the data from the v0.5 submission.

I discovered that for the clx_9282-2s_openvino-linux system the reported accuracy is mAP=22.627% while the script computes mAP=25.484%:

anton@velociti:/data/anton/inference_results_v0.5/closed/Intel/results$ for accuracy_txt in \
  ./clx_9282-2s_openvino-linux/ssd-small/Server/accuracy/accuracy.txt \
  ./clx_9282-2s_openvino-linux/ssd-small/Offline/accuracy/accuracy.txt \
  ./clx_9282-2s_openvino-linux/ssd-small/SingleStream/accuracy/accuracy.txt \
; do \
  echo "$accuracy_txt"; \
  tail -1 $accuracy_txt; \
  echo "" \
; done
./clx_9282-2s_openvino-linux/ssd-small/Server/accuracy/accuracy.txt
mAP=22.627%

./clx_9282-2s_openvino-linux/ssd-small/Offline/accuracy/accuracy.txt
mAP=22.627%

./clx_9282-2s_openvino-linux/ssd-small/SingleStream/accuracy/accuracy.txt
mAP=22.627%

anton@velociti:/data/anton/inference_results_v0.5/closed/Intel/results$ for mlperf_log_accuracy_json in \
  ./clx_9282-2s_openvino-linux/ssd-small/Server/accuracy/mlperf_log_accuracy.json \
  ./clx_9282-2s_openvino-linux/ssd-small/Offline/accuracy/mlperf_log_accuracy.json \
  ./clx_9282-2s_openvino-linux/ssd-small/SingleStream/accuracy/mlperf_log_accuracy.json \
; do \
  echo "$mlperf_log_accuracy_json"; \
  wc -l "$mlperf_log_accuracy_json"; \
  /usr/bin/python3.6 \
  /home/anton/CK_TOOLS/mlperf-inference-dividiti.v0.5-intel/inference/v0.5/classification_and_detection/tools/accuracy-coco.py \
  --coco-dir /datasets/dataset-coco-2017-val/ \
  --mlperf-accuracy-file $mlperf_log_accuracy_json; \
  echo "" \
; done

./clx_9282-2s_openvino-linux/ssd-small/Server/accuracy/mlperf_log_accuracy.json
31754 ./clx_9282-2s_openvino-linux/ssd-small/Server/accuracy/mlperf_log_accuracy.json
loading annotations into memory...
Done (t=0.54s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.68s).
Accumulating evaluation results...
DONE (t=0.47s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.255
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.369
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.286
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.010
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.186
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.580
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.218
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.012
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.194
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.616
mAP=25.484%

./clx_9282-2s_openvino-linux/ssd-small/Offline/accuracy/mlperf_log_accuracy.json
66074 ./clx_9282-2s_openvino-linux/ssd-small/Offline/accuracy/mlperf_log_accuracy.json
loading annotations into memory...
Done (t=0.55s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.60s).
Accumulating evaluation results...
DONE (t=0.45s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.255
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.369
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.286
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.010
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.186
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.580
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.218
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.012
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.194
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.616
mAP=25.484%

./clx_9282-2s_openvino-linux/ssd-small/SingleStream/accuracy/mlperf_log_accuracy.json
4255 ./clx_9282-2s_openvino-linux/ssd-small/SingleStream/accuracy/mlperf_log_accuracy.json
loading annotations into memory...
Done (t=0.46s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.61s).
Accumulating evaluation results...
DONE (t=0.46s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.255
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.369
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.286
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.010
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.186
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.580
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.218
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.012
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.194
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.616
mAP=25.484%

Note that the JSON logs for the Server, Offline and SingleScream scenarios have 31754, 66074 and 4255 lines, respectively, not 5002 as expected.

psyhtest commented 4 years ago

For the ICL-I3-1005G1_OpenVINO-Windows system, I get the accuracy and the number of lines as expected but also 21248 error lines like: ERROR: loadgen(800) and payload(2583339204608) disagree on image_idx for the SingleStream log:

anton@velociti:/data/anton/inference_results_v0.5/closed/Intel/results$ for mlperf_log_accuracy_json in \
  ./ICL-I3-1005G1_OpenVINO-Windows/ssd-small/SingleStream/accuracy/mlperf_log_accuracy.json \
  ./ICL-I3-1005G1_OpenVINO-Windows/ssd-small/Offline/accuracy/mlperf_log_accuracy.json \
; do \
  echo "$mlperf_log_accuracy_json"; \
  wc -l "$mlperf_log_accuracy_json"; \
  /usr/bin/python3.6 \
  /home/anton/CK_TOOLS/mlperf-inference-dividiti.v0.5-intel/inference/v0.5/classification_and_detection/tools/accuracy-coco.py \
  --coco-dir /datasets/dataset-coco-2017-val/ \
  --mlperf-accuracy-file $mlperf_log_accuracy_json; \
  echo "" \
; done
...
ERROR: loadgen(1871) and payload(2583339204608) disagree on image_idx
ERROR: loadgen(1871) and payload(2583339204608) disagree on image_idx
ERROR: loadgen(800) and payload(2583339204608) disagree on image_idx
Loading and preparing results...
DONE (t=0.13s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=12.38s).
Accumulating evaluation results...
DONE (t=2.13s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.226
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.347
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.246
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.018
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.158
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.205
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.257
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.258
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.023
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.182
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.594
mAP=22.627%

./ICL-I3-1005G1_OpenVINO-Windows/ssd-small/Offline/accuracy/mlperf_log_accuracy.json
5002 ./ICL-I3-1005G1_OpenVINO-Windows/ssd-small/Offline/accuracy/mlperf_log_accuracy.json
loading annotations into memory...
Done (t=0.51s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.10s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=12.58s).
Accumulating evaluation results...
DONE (t=2.11s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.226
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.347
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.246
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.018
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.158
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.205
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.257
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.258
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.023
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.182
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.594
mAP=22.627%
psyhtest commented 4 years ago

No problems with DellEMC's results (including OpenVINO and TensorRT):

anton@velociti:/data/anton/inference_results_v0.5/closed/DellEMC/results$ for mlperf_log_accuracy_json in \
  ./DELLEMC_R740xd6248_openvino-linux/ssd-small/Offline/accuracy/mlperf_log_accuracy.json \
  ./DELLEMC_R740xd6248_openvino-linux/ssd-small/SingleStream/accuracy/mlperf_log_accuracy.json \
  ./DELLEMC_R740xd8276_openvino-linux/ssd-small/Offline/accuracy/mlperf_log_accuracy.json \
  ./DELLEMC_R740xd8276_openvino-linux/ssd-small/SingleStream/accuracy/mlperf_log_accuracy.json \
  ./R740_T4x4_tensorrt/ssd-small/Server/accuracy/mlperf_log_accuracy.json \
  ./R740_T4x4_tensorrt/ssd-small/Offline/accuracy/mlperf_log_accuracy.json \
; do \
  echo "$mlperf_log_accuracy_json"; \
  wc -l "$mlperf_log_accuracy_json"; \
  /usr/bin/python3.6 \
  /home/anton/CK_TOOLS/mlperf-inference-dividiti.v0.5-intel/inference/v0.5/classification_and_detection/tools/accuracy-coco.py \
  --coco-dir /datasets/dataset-coco-2017-val/ \
  --mlperf-accuracy-file $mlperf_log_accuracy_json | tail -1; \
  echo "" \
; done
./DELLEMC_R740xd6248_openvino-linux/ssd-small/Offline/accuracy/mlperf_log_accuracy.json
5002 ./DELLEMC_R740xd6248_openvino-linux/ssd-small/Offline/accuracy/mlperf_log_accuracy.json
mAP=22.627%

./DELLEMC_R740xd6248_openvino-linux/ssd-small/SingleStream/accuracy/mlperf_log_accuracy.json
5002 ./DELLEMC_R740xd6248_openvino-linux/ssd-small/SingleStream/accuracy/mlperf_log_accuracy.json
mAP=22.627%

./DELLEMC_R740xd8276_openvino-linux/ssd-small/Offline/accuracy/mlperf_log_accuracy.json
5002 ./DELLEMC_R740xd8276_openvino-linux/ssd-small/Offline/accuracy/mlperf_log_accuracy.json
mAP=22.627%

./DELLEMC_R740xd8276_openvino-linux/ssd-small/SingleStream/accuracy/mlperf_log_accuracy.json
5002 ./DELLEMC_R740xd8276_openvino-linux/ssd-small/SingleStream/accuracy/mlperf_log_accuracy.json
mAP=22.627%

./R740_T4x4_tensorrt/ssd-small/Server/accuracy/mlperf_log_accuracy.json
5002 ./R740_T4x4_tensorrt/ssd-small/Server/accuracy/mlperf_log_accuracy.json
mAP=22.911%

./R740_T4x4_tensorrt/ssd-small/Offline/accuracy/mlperf_log_accuracy.json
5002 ./R740_T4x4_tensorrt/ssd-small/Offline/accuracy/mlperf_log_accuracy.json
mAP=22.912%
psyhtest commented 4 years ago

So the resolution seems to have two aspects to it:

This can be confirmed via a new Docker image (tagged mlperf_inference_results_v0.5_issue_27_resolved on Docker Hub).

Alternatively, you can run this natively:

$ ck install package --tags=lib,openvino,pre-release
$ ck install package --tags=model,openvino,ssd-mobilenet
$ export NPROCS=`grep -c processor /proc/cpuinfo`
$ ck benchmark program:mlperf-inference-v0.5 --cmd_key=object-detection --repetitions=1 --skip_print_timers \
--env.CK_LOADGEN_MODE=Accuracy --env.CK_LOADGEN_SCENARIO=Offline \
--env.CK_OPENVINO_NIREQ=$NPROCS --env.CK_OPENVINO_NTHREADS=$NPROCS --env.CK_OPENVINO_NSTREAMS=$NPROCS \
--dep_add_tags.openvino=pre-release
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.226
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.347
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.246
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.018
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.158
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.520
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.205
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.257
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.258
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.023
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.183
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.594
mAP=22.617%
psyhtest commented 4 years ago

The 2019_R3 release is no good either:

$ ck install package --tags=lib,openvino,2019_R3
$ export NPROCS=`grep -c processor /proc/cpuinfo`
$ ck benchmark program:mlperf-inference-v0.5 --cmd_key=object-detection --repetitions=1 --skip_print_timers \
--env.CK_LOADGEN_MODE=Accuracy --env.CK_LOADGEN_SCENARIO=Offline \
--env.CK_OPENVINO_NIREQ=$NPROCS --env.CK_OPENVINO_NTHREADS=$NPROCS --env.CK_OPENVINO_NSTREAMS=$NPROCS \
--dep_add_tags.openvino=2019_R3 --dep_add_tags.loadgen=for.openvino --dep_add_tags.mlperf-inference-src=dividiti.v0.5-intel \
--dep_add_tags.compiler=v8 --dep_add_tags.cmake=v3.14 --dep_add_tags.opencv=v3.4.3
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.208                                                                                                                             
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.320                                                                                                                             
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.226                                                                                                                             
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.016                                                                                                                             
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.145                                                                                                                             
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.472                                                                                                                             
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.192                                                                                                                             
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.237                                                                                                                             
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.238                                                                                                                             
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.021
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.167
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.536
mAP=20.798%
psyhtest commented 4 years ago

... while the benchmark code doesn't build with 2020.1:

$ ck install package --tags=lib,openvino,2020.1
$ export NPROCS=`grep -c processor /proc/cpuinfo`
$ ck benchmark program:mlperf-inference-v0.5 --cmd_key=object-detection --repetitions=1 --skip_print_timers \
--env.CK_LOADGEN_MODE=Accuracy --env.CK_LOADGEN_SCENARIO=Offline \
--env.CK_OPENVINO_NIREQ=$NPROCS --env.CK_OPENVINO_NTHREADS=$NPROCS --env.CK_OPENVINO_NSTREAMS=$NPROCS \
--dep_add_tags.openvino=2020.1 --dep_add_tags.loadgen=for.openvino --dep_add_tags.mlperf-inference-src=dividiti.v0.5-intel \
--dep_add_tags.compiler=v8 --dep_add_tags.cmake=v3.14 --dep_add_tags.opencv=v3.4.3
...
[ 50%] Building CXX object CMakeFiles/ov_mlperf.dir/main_ov.cc.o               
/usr/bin/g++-8   -I/home/anton/CK_TOOLS/lib-boost-1.67.0-gcc-8.4.0-compiler.python-3.6.10-linux-64/install/include -I/home/anton/CK_TOOLS/lib-mlperf-loadgen-static-gcc-8.4.0-compiler.python-3.6.10-for.ope
nvino-linux-64/include -I/home/anton/CK_TOOLS/lib-openvino-gcc-8.4.0-2020.1-linux-64/include -I/home/anton/CK_TOOLS/lib-openvino-gcc-8.4.0-2020.1-linux-64/dldt/inference-engine/src/extension -isystem /hom
e/anton/CK_TOOLS/lib-opencv-3.4.3-gcc-8.4.0-linux-64/install/include -isystem /home/anton/CK_TOOLS/lib-opencv-3.4.3-gcc-8.4.0-linux-64/install/include/opencv  -fPIE -fstack-protector-strong -Wno-error -fP
IC -fno-operator-names -Wformat -Wformat-security -Wall -O2 -std=c++14 -pthread -USE_OPENCV -DBOOST_ERROR_CODE_HEADER_ONLY -O3 -DNDEBUG   -o CMakeFiles/ov_mlperf.dir/main_ov.cc.o -c /home/anton/CK_REPOS/c
k-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/main_ov.cc                      
In file included from /home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/sut_ov.h:12,
                 from /home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/main_ov.cc:13:
/home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/backend_ov.h:5:10: fatal error: ext_list.hpp: No such file or directory
 #include <ext_list.hpp>                                                       
          ^~~~~~~~~~~~~~
compilation terminated.                         
CMakeFiles/ov_mlperf.dir/build.make:62: recipe for target 'CMakeFiles/ov_mlperf.dir/main_ov.cc.o' failed
make[2]: *** [CMakeFiles/ov_mlperf.dir/main_ov.cc.o] Error 1
make[2]: Leaving directory '/home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/tmp'
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/ov_mlperf.dir/all' failed
make[1]: *** [CMakeFiles/ov_mlperf.dir/all] Error 2             
make[1]: Leaving directory '/home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/tmp'
Makefile:83: recipe for target 'all' failed                     
make: *** [all] Error 2
psyhtest commented 4 years ago

I'm guessing new include paths need to be set for 2020.1:

diff --git a/program/mlperf-inference-v0.5/ov_mlperf_cpu/CMakeLists.txt b/program/mlperf-inference-v0.5/ov_mlperf_cpu/CMakeLists.txt
index ab50451..005fd67 100644
--- a/program/mlperf-inference-v0.5/ov_mlperf_cpu/CMakeLists.txt
+++ b/program/mlperf-inference-v0.5/ov_mlperf_cpu/CMakeLists.txt
@@ -42,6 +42,9 @@ set(OPENVINO_LIBRARY               "${OPENVINO_LIB_DIR}/libinference_engine.so")
 set(OPENVINO_CPU_EXTENSION_LIBRARY "${OPENVINO_LIB_DIR}/libcpu_extension.so")
 set(OPENVINO_INCLUDE_DIR           "${OPENVINO_DIR}/include")
 set(OPENVINO_EXTENSION_DIR         "${OPENVINO_DIR}/openvino/inference-engine/src/extension")
+set(OPENVINO_MKLDNN_NODES_DIR      "${OPENVINO_DIR}/openvino/inference-engine/src/mkldnn_plugin/nodes")^M
+set(OPENVINO_THIRDPARTY_MKLDNN_SRC_CPU "${OPENVINO_DIR}/openvino/inference-engine/thirdparty/mkl-dnn/src/cpu")^M
+set(OPENVINO_THIRDPARTY_MKLDNN_SRC_COMMON "${OPENVINO_DIR}/openvino/inference-engine/thirdparty/mkl-dnn/src/common")^M

 MESSAGE(STATUS "OPENVINO_DIR=${OPENVINO_DIR}")
 MESSAGE(STATUS "OPENVINO_LIB_DIR=${OPENVINO_LIB_DIR}")
@@ -83,6 +86,9 @@ include_directories(
     ${LOADGEN_DIR}
     ${OPENVINO_INCLUDE_DIR}
     ${OPENVINO_EXTENSION_DIR}
+    ${OPENVINO_MKLDNN_NODES_DIR}^M
+    ${OPENVINO_THIRDPARTY_MKLDNN_SRC_CPU}^M
+    ${OPENVINO_THIRDPARTY_MKLDNN_SRC_COMMON}^M
 )

 set(SOURCE_FILES backend_ov.h dataset_ov.h sut_ov.h infer_request_wrap.h item_ov.h main_ov.cc)

as well as modifying the program:

diff --git a/program/mlperf-inference-v0.5/ov_mlperf_cpu/backend_ov.h b/program/mlperf-inference-v0.5/ov_mlperf_cpu/backend_ov.h
index 8ea1de2..b0d2207 100644
--- a/program/mlperf-inference-v0.5/ov_mlperf_cpu/backend_ov.h
+++ b/program/mlperf-inference-v0.5/ov_mlperf_cpu/backend_ov.h
@@ -2,7 +2,7 @@
 #define BACKENDOV_H__

 #include <inference_engine.hpp>
-#include <ext_list.hpp>
+#include <list.hpp>

 #include "infer_request_wrap.h"

@@ -79,7 +79,7 @@ public:
         Core ie;
         const std::string device { "CPU" };
         if (device == "CPU") {
-            ie.AddExtension(std::make_shared<Extensions::Cpu::CpuExtensions>(),
+            ie.AddExtension(std::make_shared<Extensions::Cpu::MKLDNNExtensions>(),
                     "CPU");
             if (settings_.scenario == mlperf::TestScenario::SingleStream) {
                 ie.SetConfig(

but still doesn't build:

In file included from /home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/sut_ov.h:12,
                 from /home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/main_ov.cc:13:
/home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/backend_ov.h: In member function ‘void BackendOV::load(std::__cxx11::string)’:
/home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/backend_ov.h:82:81: error: no matching function for call to ‘make_shared<template<mkldnn::impl::cpu::cpu_isa_t Type> class Infe
renceEngine::Extensions::Cpu::MKLDNNExtensions>()’
             ie.AddExtension(std::make_shared<Extensions::Cpu::MKLDNNExtensions>(),
                                                                                 ^
psyhtest commented 4 years ago

Also trouble is expected with 2020.2:

In file included from /home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/backend_ov.h:7,
                 from /home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/sut_ov.h:12,
                 from /home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/main_ov.cc:13:
/home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/infer_request_wrap.h: In member function ‘void InferReqWrap::postProcessImagenet(std::vector<Item>&, std::vector<unsigned int>&
, std::vector<long unsigned int>&)’:
/home/anton/CK_REPOS/ck-openvino/program/mlperf-inference-v0.5/ov_mlperf_cpu/infer_request_wrap.h:129:42: warning: ‘void InferenceEngine::TopResults(unsigned int, InferenceEngine::Blob&, std::vector<unsig
ned int>&)’ is deprecated: InferenceEngine utility functions are not a part of public API. Will be removed in 2020 R2 [-Wdeprecated-declarations]
             TopResults(1, *(b.blob_), res);
                                          ^
psyhtest commented 4 years ago

Still waiting for Intel's response on these issues.

cc: @christ1ne

fenz commented 3 years ago

@psyhtest I tried a "parallel" reproduction exercise and found similar results. I saw you have a working example in CK: https://github.com/ctuning/ck-openvino#accuracy-on-the-coco-2017-validation-set but I didn't get if you figured out the "reverse_input_channels" question. I converted the models using the official OpenVINO docker container from Intel:

FROM openvino/ubuntu18_dev:2019_R3.1 as builder

WORKDIR /tmp

RUN curl -O https://zenodo.org/record/3401714/files/ssd_mobilenet_v1_quant_ft_no_zero_point_frozen_inference_graph.pb && \
    curl -O https://zenodo.org/record/3252084/files/mobilenet_v1_ssd_8bit_finetuned.tar.gz && \
    tar xf mobilenet_v1_ssd_8bit_finetuned.tar.gz && \
    rm mobilenet_v1_ssd_8bit_finetuned.tar.gz && \
    cp mobilenet_v1_ssd_finetuned/pipeline.config . && \
    rm -rf mobilenet_v1_ssd_finetuned && \
    python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo.py \
        --input_model /tmp/ssd_mobilenet_v1_quant_ft_no_zero_point_frozen_inference_graph.pb \
        --input_shape [1,300,300,3] \
        --reverse_input_channels \
        --tensorflow_use_custom_operations_config /opt/intel/openvino/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json \
        --tensorflow_object_detection_api_pipeline_config /tmp/pipeline.config

I tried to generate both FP32 precision (default) and FP16 precision models w/wo "reverse" option but I get a "good enough' mAP only when not using the option (which is different from what suggested in the submission: intel

Regarding the OpenVINO version 2020.x it seems cpu_extension is now part of the library itself and doesn't need to be mentioned separately( https://github.com/openvinotoolkit/openvino/issues/916 ) Based on this and on some OpenVINO tutorial (and seems used in the newer Intel submission as well) I changed some of the lines. From this folder: https://github.com/mlcommons/inference_results_v0.5/tree/master/closed/Intel/code/ssd-small/openvino-linux I run:

sed -i '/IE::ie_cpu_extension/d' ./CMakeLists.txt && \
sed -i \
    -e 's/ext_list.hpp/ie_core.hpp/g' \
    -e 's/network_reader_.getNetwork().setBatchSize(batch_size)/network_.setBatchSize(batch_size)/g' \
    -e 's/network_reader_.ReadNetwork(input_model)/Core ie/g' \
    -e '/network_reader_.ReadWeights(fileNameNoExt(input_model) + ".bin");/d' \
    -e 's/network_ = network_reader_.getNetwork();/network_ = ie.ReadNetwork(input_model, fileNameNoExt(input_model) + ".bin");/g' \
    -e '/Core ie;/{$!N;/\n.*const std::string device { "CPU" };/!P;D}' \
    -e '/ie.AddExtension(std::make_shared<Extensions::Cpu::CpuExtensions>(),/,/"CPU");/d' ./backend_ov.h

I tried to have a "minimal" equivalent code, not using much from the newer submissions but only:

Regarding the issue: ERROR: loadgen(1871) and payload(2583339204608) disagree on image_idx It seems in the "windows" version there's a difference in the creation of the "Item" for the "ssd-mobilenet" model: linux vs windows and, based on the item_ov.h (which looks the same for both linux and windows versions), it seems the linux version is the correct one, using the same "order" for the windows version the error disappear.

I know this is the first submission round but it is interesting to compare old and newer version, that's why it is important to me to clarify those doubts.

psyhtest commented 3 years ago

@fenz Only a year has passed since I looked into this, wow. Feels more like forever :).

I would have looked at the v1.0 or v0.7 code but alas Intel only submitted SSD-MobileNet to v0.5.

fenz commented 3 years ago

So, as far as I get, the current status is to not use "--reverse_input_channels" option when converting to OpenVINO model representation. By the way, I started looked at this long time ago as well, I'm back on it since (I thought) I could understand it a bit better now. Thanks for your answer.