AVX512 is not faster than AVX2 for quantized model

Grigor355 commented 2 years ago

I have quantized a CNN model with OpenVino. When timing inference speed on Intel® Xeon® Gold 6138 (has AVX512) and Intel I7-8700K(AVX2), I don't see any significant difference. Moreover, Xeon is a little bit slower. Timing was done both manually and with benchmark app. I also checked with benchmar_app -pc, AVX512 is being used on Xeon. Given that Xeon has 2 FMA units, I think I should have seen some improvement of inference speed.

Iffa-Intel commented 2 years ago

Generally, the Intel AVX-512 enables twice the number of floating point operations per second (FLOPS) per clock cycle compared to its predecessor, Intel AVX2. So the AVX512 should be faster than AVX2. However, the involvement of INT8 precision might somewhat affect this.

We'll further investigate and get back to you asap.

jgespino commented 2 years ago

@Grigor355 Would you please share the model in native format (caffee, TF, ONNX, etc) and the model optimizer command used. Also, please provide the quantized IR model for us to reproduce on our end.

Could you also please share additional information about your system environment?

System OS
OpenVINO Version
Using CPP or Python?

Grigor355 commented 2 years ago

I had an ONNX model which I first converted to OpenVino IR format by model_optimizer mo --input_model "molde.onnx" --input_shape "[1, x, x, x]" --output_dir "ov" --data_type FP16 and then I followed official tutorials to do accuracy aware quantization

algorithms = [
        {
            "name": "DefaultQuantization",
            "params": {
                'target_device': 'CPU',
                'stat_subset_size': 400,
                'preset': 'performance',
                    }
        }
    ]

# Step 2: Initialize the data loader.
data_loader = Dtldr(dataset_config)

# Step 3 (Optional. Required for AccuracyAwareQuantization): Initialize the metric.
metric = Accuracy()

# Step 4: Initialize the engine for metric calculation and statistics collection.
engine = IEEngine(engine_config, data_loader, metric)

# Step 5: Create a pipeline of compression algorithms.
pipeline = create_pipeline(algorithms, engine)

# Step 6: Execute the pipeline.
compressed_model = pipeline.run(model)

OS - Ubuntu 18.04 OpenVino Version - 2022.1.0 Using Python

I omitted the rest of the implementations of data_loader, metric, etc, but after quantization the model became faster and smaller, so the quantization part worked. Can you give some guidance where to look for possible bugs without sharing the model?

Grigor355 commented 2 years ago

@jgespino I just sent the model file via email. Thanks.

jgespino commented 2 years ago

@Grigor355 Thank you, can you also share your quantized model in IR format?

jgespino commented 2 years ago

@Grigor355 Could you confirm you sent me the right quantized model? The model doesn't have a quantization_parameters section in the xml file. For example, the face-detection-adas-0001.xml int8 model:

Is it possible for you to share the full code to quantize and all required files (dataset)?

Grigor355 commented 2 years ago

I followed https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/113-image-classification-quantization/113-image-classification-quantization.ipynb notebook The model was saved by compression.graph.save_model as it is done in the notebook. The quantization parameters you mentioned are not present in the xml file but the xml has FakeQuantize layers. Isn't it enough to make sure that the model is quantized? Also, I showed you the algorithms json in my earlier comment.

jgespino commented 2 years ago

@Grigor355 Sorry.... you are right. Didn't see the FakeQuantize layers earlier. I tested on an Intel Core i5-1135G7 with avx512 and a Intel Core i7-8665U with AVX2 and observed a ~3x FPS with the benchmark_app on the AVX512 system. Let me try to find a XEON Gold processor to test.

Grigor355 commented 2 years ago

With '~3x FPS' you mean 3x faster model? Also, can you benchmark with -api sync? I think the default is aysnc.

jgespino commented 2 years ago

@Grigor355

With '~3x FPS' you mean 3x faster model?

Yes, x3 the performance when comparing Intel Core i7-8665U with Intel Core i5-1135G7.

I was not able find a system with Intel® Xeon® Gold 6138. However, see the benchmark_app results using your model on my systems with Intel(R) Core(TM) i7-8665U and Intel(r) Xeon(r) Platinum 8368 processor.

Could you share the results you are seeing from your side with the benchmark_app -m <model> -api sync? In addition, could you share the full output when running benchmark_app with -api sync and -pc?

benchmark_app -m ov.xml -api sync

Count:          20 iterations
Duration:       60568.82 ms
Latency:
    Median:     3164.29 ms
    AVG:        3011.18 ms
    MIN:        1886.81 ms
    MAX:        3743.68 ms
Throughput: 0.32 FPS

Count:          572 iterations
Duration:       60074.97 ms
Latency:
    Median:     91.73 ms
    AVG:        91.92 ms
    MIN:        72.00 ms
    MAX:        129.76 ms
Throughput: 10.90 FPS

benchmark_app -m model_d.xml -api sync

Count:          11 iterations
Duration:       61627.41 ms
Latency:
    Median:     5868.72 ms
    AVG:        5585.64 ms
    MIN:        3533.22 ms
    MAX:        6302.83 ms
Throughput: 0.17 FPS

Count:          249 iterations
Duration:       60155.24 ms
Latency:
    Median:     227.86 ms
    AVG:        228.41 ms
    MIN:        222.38 ms
    MAX:        277.19 ms
Throughput: 4.39 FPS

Grigor355 commented 2 years ago

My results were around 8-900ms for both models, but let me recheck and come back. Meanwhile, my input size is [1, 3, 1120, 1120] (I believe I missed this detail), what was your input size? As the model is fully convolutional, it can work with different input sizes. You are seeing ~2.5 x faster inference when switching to quantized model on Intel(r) Xeon(r) Platinum 8368, this CPU has 32 cores and VNNI, whereas mine has 20 cores and no VNNI. Can it be the reason of performance difference?

Grigor355 commented 2 years ago

Intel I7-8700K(AVX2) benchmark_app -m ov.xml -api sync -pc

[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device CPU performance hint will be set to LATENCY.
[ INFO ] OpenVINO:
         API version............. custom_master_c519aff42f144f12b65340e02f5d303411779634
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.2
         Build................... custom_master_c519aff42f144f12b65340e02f5d303411779634

[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 11.46 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'x' precision u8, dimensions ([N,C,H,W]): 1 3 1120 1120
[ INFO ] Model output 'y' precision f32, dimensions ([...]): 1 560 560 2
[ INFO ] Model output 'feature' precision f32, dimensions ([...]): 1 32 560 560
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 68.44 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 12)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  , 
[ INFO ]   NUM_STREAMS  , 1
[ INFO ]   AFFINITY  , Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , True
[ INFO ]   INFERENCE_PRECISION_HINT  , <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT  , PerformanceMode.LATENCY
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 1 infer requests took 0.10 ms
[ WARNING ] No input files were given for input 'x'!. This input will be filled with random values!
[ INFO ] Fill input 'x' with random values 
[Step 10/11] Measuring performance (Start inference synchronously, inference only: True, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 785.09 ms
[Step 11/11] Dumping statistics report
[ INFO ] Performance counts for 0-th infer request
x                             Status.NOT_RUN layerType: Parameter          realTime: 0:00:00   cpu: 0:00:00        execType: unknown_I8
Convolution_58/fq_input_0     Status.EXECUTEDlayerType: FakeQuantize       realTime: 0:00:00.000366cpu: 0:00:00.000366 execType: jit_avx2_I8
Convolution_58/fq_input_0_... Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.000286cpu: 0:00:00.000286 execType: jit_uni_I8
Convolution_58                Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.007511cpu: 0:00:00.007511 execType: jit_avx2_I8
Convolution_107/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_107               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.074245cpu: 0:00:00.074245 execType: jit_avx2_I8
input.16/fq_input_0           Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.16                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.003695cpu: 0:00:00.003695 execType: jit_avx2_I8
Convolution_157               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.037036cpu: 0:00:00.037036 execType: jit_avx2_I8
Convolution_206/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_206               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.080867cpu: 0:00:00.080867 execType: jit_avx2_I8
input.176/fq_input_1          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.36                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.001752cpu: 0:00:00.001752 execType: jit_avx2_I8
Convolution_256               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.040588cpu: 0:00:00.040588 execType: jit_avx2_I8
Convolution_305/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_305               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.082170cpu: 0:00:00.082170 execType: jit_avx2_I8
Convolution_354/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_354               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.081527cpu: 0:00:00.081527 execType: jit_avx2_I8
input.64/fq_input_0           Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.64                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.000880cpu: 0:00:00.000880 execType: jit_avx2_I8
Convolution_404               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.040828cpu: 0:00:00.040828 execType: jit_avx2_I8
Convolution_453/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_453               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.084611cpu: 0:00:00.084611 execType: jit_avx2_I8
Convolution_502/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_502               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.082719cpu: 0:00:00.082719 execType: jit_avx2_I8
input.92/fq_input_0           Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.92                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.000443cpu: 0:00:00.000443 execType: jit_avx2_I8
Convolution_552               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.020769cpu: 0:00:00.020769 execType: jit_avx2_I8
Convolution_601/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_601               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.021685cpu: 0:00:00.021685 execType: jit_avx2_I8
input.108/fq_input_0          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.108                     Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.000193cpu: 0:00:00.000193 execType: jit_avx2_I8
Convolution_650               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.044661cpu: 0:00:00.044661 execType: jit_avx2_I8
Convolution_698/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_698               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.008292cpu: 0:00:00.008292 execType: jit_avx2_1x1_I8
input.116/fq_input_0          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.116                     Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.000317cpu: 0:00:00.000317 execType: ref_I8    
Convolution_747               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.006678cpu: 0:00:00.006678 execType: jit_avx2_1x1_I8
Convolution_796/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_796               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.010967cpu: 0:00:00.010967 execType: jit_avx2_I8
onnx::Concat_220/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
onnx::Concat_220              Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.000474cpu: 0:00:00.000474 execType: jit_avx2_FP32
input.136/fq_input_0          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.136                     Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.001008cpu: 0:00:00.001008 execType: ref_I8    
Convolution_911               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.006480cpu: 0:00:00.006480 execType: jit_avx2_1x1_I8
Convolution_960/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_960               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.010400cpu: 0:00:00.010400 execType: jit_avx2_I8
onnx::Concat_244/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
onnx::Concat_244              Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.000993cpu: 0:00:00.000993 execType: jit_avx2_FP32
input.156/fq_input_0          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.156                     Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.002403cpu: 0:00:00.002403 execType: ref_I8    
Convolution_1075              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.006506cpu: 0:00:00.006506 execType: jit_avx2_1x1_I8
Convolution_1124/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1124              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.010259cpu: 0:00:00.010259 execType: jit_avx2_I8
onnx::Concat_268/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
onnx::Concat_268              Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.002147cpu: 0:00:00.002147 execType: jit_avx2_FP32
input.176/fq_input_0          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.176                     Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.005119cpu: 0:00:00.005119 execType: ref_I8    
Convolution_1239              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.006821cpu: 0:00:00.006821 execType: jit_avx2_1x1_I8
Convolution_1288/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1288              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.008951cpu: 0:00:00.008951 execType: jit_avx2_I8
feature_original              Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1337/fq_input_0   Status.EXECUTEDlayerType: FakeQuantize       realTime: 0:00:00.003299cpu: 0:00:00.003299 execType: jit_avx2_FP32
Convolution_1337              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.004677cpu: 0:00:00.004677 execType: jit_avx2_I8
Convolution_1386/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1386              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.004777cpu: 0:00:00.004777 execType: jit_avx2_I8
Convolution_1435/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1435              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.002428cpu: 0:00:00.002428 execType: jit_avx2_I8
Convolution_1484/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1484              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.000354cpu: 0:00:00.000354 execType: jit_avx2_1x1_I8
Convolution_1533/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1533              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.000158cpu: 0:00:00.000158 execType: jit_avx2_1x1_I8
Multiply_4782                 Status.NOT_RUN layerType: Multiply           realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1533_acdb_abcd... Status.NOT_RUN layerType: Reorder            realTime: 0:00:00   cpu: 0:00:00        execType: reorder_FP32
y/sink_port_0                 Status.NOT_RUN layerType: Result             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
feature                       Status.EXECUTEDlayerType: Multiply           realTime: 0:00:00.003390cpu: 0:00:00.003390 execType: jit_avx2_FP32
feature_acdb_abcd_feature/... Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.003586cpu: 0:00:00.003586 execType: reorder_FP32
feature/sink_port_0           Status.NOT_RUN layerType: Result             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
Total time:     0:00:00.817316 microseconds
Total CPU time: 0:00:00.817316 microseconds

Count:          73 iterations
Duration:       60113.92 ms
Latency:
    Median:     819.05 ms
    AVG:        818.13 ms
    MIN:        768.63 ms
    MAX:        830.70 ms
Throughput: 1.22 FPS

Grigor355 commented 2 years ago

Intel I7-8700K(AVX2)

[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device CPU performance hint will be set to LATENCY.
[ INFO ] OpenVINO:
         API version............. custom_master_c519aff42f144f12b65340e02f5d303411779634
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.2
         Build................... custom_master_c519aff42f144f12b65340e02f5d303411779634

[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 20.83 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'x' precision u8, dimensions ([N,C,H,W]): 1 3 1120 1120
[ INFO ] Model output 'y' precision f32, dimensions ([...]): 1 560 560 2
[ INFO ] Model output 'feature' precision f32, dimensions ([...]): 1 32 560 560
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 149.78 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 12)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  , 
[ INFO ]   NUM_STREAMS  , 1
[ INFO ]   AFFINITY  , Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , True
[ INFO ]   INFERENCE_PRECISION_HINT  , <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT  , PerformanceMode.LATENCY
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 1 infer requests took 0.09 ms
[ WARNING ] No input files were given for input 'x'!. This input will be filled with random values!
[ INFO ] Fill input 'x' with random values 
[Step 10/11] Measuring performance (Start inference synchronously, inference only: True, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 1309.69 ms
[Step 11/11] Dumping statistics report
[ INFO ] Performance counts for 0-th infer request
x                             Status.NOT_RUN layerType: Parameter          realTime: 0:00:00   cpu: 0:00:00        execType: unknown_I8
Convert_392                   Status.EXECUTEDlayerType: Convert            realTime: 0:00:00.000970cpu: 0:00:00.000970 execType: unknown_I8
Convolution_58                Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.019957cpu: 0:00:00.019957 execType: jit_avx2_FP32
onnx::Conv_157                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_107               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.129291cpu: 0:00:00.129291 execType: jit_avx2_FP32
onnx::MaxPool_160             Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.16                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.014426cpu: 0:00:00.014426 execType: jit_avx2_FP32
Convolution_157               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.063981cpu: 0:00:00.063981 execType: jit_avx2_FP32
onnx::Conv_164                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_206               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.130543cpu: 0:00:00.130543 execType: jit_avx2_FP32
onnx::MaxPool_167             Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_206___input.176   Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.014405cpu: 0:00:00.014405 execType: ref_any_FP32
input.36                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.007328cpu: 0:00:00.007328 execType: jit_avx2_FP32
Convolution_256               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.062876cpu: 0:00:00.062876 execType: jit_avx2_FP32
onnx::Conv_171                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_305               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.129902cpu: 0:00:00.129902 execType: jit_avx2_FP32
onnx::Conv_174                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_305___input.156   Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.007037cpu: 0:00:00.007037 execType: ref_any_FP32
Convolution_354               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.129051cpu: 0:00:00.129051 execType: jit_avx2_FP32
onnx::MaxPool_177             Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.64                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.003665cpu: 0:00:00.003665 execType: jit_avx2_FP32
Convolution_404               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.064279cpu: 0:00:00.064279 execType: jit_avx2_FP32
onnx::Conv_181                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_453               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.129476cpu: 0:00:00.129476 execType: jit_avx2_FP32
onnx::Conv_184                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_453___input.136   Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.003435cpu: 0:00:00.003435 execType: ref_any_FP32
Convolution_502               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.129045cpu: 0:00:00.129045 execType: jit_avx2_FP32
onnx::MaxPool_187             Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.92                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.001853cpu: 0:00:00.001853 execType: jit_avx2_FP32
Convolution_552               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.032419cpu: 0:00:00.032419 execType: jit_avx2_FP32
onnx::Conv_191                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_601               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.032687cpu: 0:00:00.032687 execType: jit_avx2_FP32
Convolution_601___input.116   Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.000805cpu: 0:00:00.000805 execType: ref_any_FP32
input.108                     Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.000804cpu: 0:00:00.000804 execType: jit_avx2_FP32
input.108_aBcd8b_abcd_Conv... Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.000906cpu: 0:00:00.000906 execType: jit_FP32  
Convolution_650               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.098469cpu: 0:00:00.098469 execType: jit_gemm_FP32
Convolution_650_abcd_aBcd8... Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.001737cpu: 0:00:00.001737 execType: jit_FP32  
Convolution_698               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.015239cpu: 0:00:00.015239 execType: jit_avx2_1x1_FP32
input.116                     Status.NOT_RUN layerType: Concat             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
Convolution_747               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.011737cpu: 0:00:00.011737 execType: jit_avx2_1x1_FP32
onnx::Conv_200                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_796               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.016472cpu: 0:00:00.016472 execType: jit_avx2_FP32
onnx::Shape_203               Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
onnx::Concat_220              Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.001400cpu: 0:00:00.001400 execType: jit_avx2_FP32
onnx::Concat_220___input.136  Status.NOT_RUN layerType: Reorder            realTime: 0:00:00   cpu: 0:00:00        execType: reorder_FP32
input.136                     Status.NOT_RUN layerType: Concat             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
Convolution_911               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.011367cpu: 0:00:00.011367 execType: jit_avx2_1x1_FP32
onnx::Conv_224                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_960               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.016280cpu: 0:00:00.016280 execType: jit_avx2_FP32
onnx::Shape_227               Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
onnx::Concat_244              Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.003013cpu: 0:00:00.003013 execType: jit_avx2_FP32
onnx::Concat_244___input.156  Status.NOT_RUN layerType: Reorder            realTime: 0:00:00   cpu: 0:00:00        execType: reorder_FP32
input.156                     Status.NOT_RUN layerType: Concat             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
Convolution_1075              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.011901cpu: 0:00:00.011901 execType: jit_avx2_1x1_FP32
onnx::Conv_248                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1124              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.016322cpu: 0:00:00.016322 execType: jit_avx2_FP32
onnx::Shape_251               Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
onnx::Concat_268              Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.005849cpu: 0:00:00.005849 execType: jit_avx2_FP32
onnx::Concat_268___input.176  Status.NOT_RUN layerType: Reorder            realTime: 0:00:00   cpu: 0:00:00        execType: reorder_FP32
input.176                     Status.NOT_RUN layerType: Concat             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
Convolution_1239              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.015198cpu: 0:00:00.015198 execType: jit_avx2_1x1_FP32
onnx::Conv_272                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1288              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.016208cpu: 0:00:00.016208 execType: jit_avx2_FP32
feature                       Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1288_aBcd8b_ab... Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.003569cpu: 0:00:00.003569 execType: jit_FP32  
feature/sink_port_0           Status.NOT_RUN layerType: Result             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
Convolution_1337              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.007950cpu: 0:00:00.007950 execType: jit_avx2_FP32
onnx::Conv_277                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1386              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.008250cpu: 0:00:00.008250 execType: jit_avx2_FP32
onnx::Conv_279                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1435              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.005590cpu: 0:00:00.005590 execType: jit_avx2_FP32
onnx::Conv_281                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1484              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.001655cpu: 0:00:00.001655 execType: jit_avx2_1x1_FP32
onnx::Conv_283                Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1533              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.001119cpu: 0:00:00.001119 execType: jit_avx2_1x1_FP32
Convolution_1533_aBcd8b_ab... Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.000852cpu: 0:00:00.000852 execType: ref_any_FP32
y                             Status.EXECUTEDlayerType: Transpose          realTime: 0:00:00.000141cpu: 0:00:00.000141 execType: unknown_FP32
y/sink_port_0                 Status.NOT_RUN layerType: Result             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
Total time:     0:00:01.379459 microseconds
Total CPU time: 0:00:01.379459 microseconds

Count:          44 iterations
Duration:       61018.51 ms
Latency:
    Median:     1379.29 ms
    AVG:        1381.20 ms
    MIN:        1377.41 ms
    MAX:        1399.46 ms
Throughput: 0.73 FPS

Grigor355 commented 2 years ago

Intel® Xeon® Gold 6138 quantized model (AVX512 instr. set is being used)

[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device CPU performance hint will be set to LATENCY.
[ INFO ] OpenVINO:
         API version............. 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.1
         Build................... 2022.1.0-7019-cdb9bec7210-releases/2022/1
[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 5667.81 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'x' precision u8, dimensions ([N,C,H,W]): 1 3 1120 1120
[ INFO ] Model output 'y' precision f32, dimensions ([...]): 1 560 560 2
[ INFO ] Model output 'feature' precision f32, dimensions ([...]): 1 32 560 560
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 14566.11 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 12)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['WINOGRAD', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  , 
[ INFO ]   NUM_STREAMS  , 1
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , True
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 1 infer requests took 0.29 ms
[ WARNING ] No input files were given for input 'x'!. This input will be filled with random values!
[ INFO ] Fill input 'x' with random values 
[Step 10/11] Measuring performance (Start inference synchronously, inference only: True, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 1313.55 ms
[Step 11/11] Dumping statistics report
[ INFO ] Performance counts for 0-th infer request
x                             Status.NOT_RUN layerType: Parameter          realTime: 0:00:00   cpu: 0:00:00        execType: unknown_I8
Convolution_58/fq_input_0     Status.EXECUTEDlayerType: FakeQuantize       realTime: 0:00:00.000511cpu: 0:00:00.000511 execType: jit_avx512_I8
Convolution_58/fq_input_0_... Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.000654cpu: 0:00:00.000654 execType: jit_uni_I8
Convolution_58                Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.007742cpu: 0:00:00.007742 execType: jit_avx512_I8
Convolution_107/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_107               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.083978cpu: 0:00:00.083978 execType: jit_avx512_I8
161/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
161                           Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.002892cpu: 0:00:00.002892 execType: jit_avx512_I8
Convolution_157               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.042987cpu: 0:00:00.042987 execType: jit_avx512_I8
Convolution_206/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_206               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.083838cpu: 0:00:00.083838 execType: jit_avx512_I8
168/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
168                           Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.001266cpu: 0:00:00.001266 execType: jit_avx512_I8
Convolution_256               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.043359cpu: 0:00:00.043359 execType: jit_avx512_I8
Convolution_305/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_305               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.083507cpu: 0:00:00.083507 execType: jit_avx512_I8
245/fq_input_1                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_354               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.081140cpu: 0:00:00.081140 execType: jit_avx512_I8
178/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
178                           Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.000705cpu: 0:00:00.000705 execType: jit_avx512_I8
Convolution_404               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.040893cpu: 0:00:00.040893 execType: jit_avx512_I8
Convolution_453/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_453               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.082873cpu: 0:00:00.082873 execType: jit_avx512_I8
221/fq_input_1                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_502               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.085557cpu: 0:00:00.085557 execType: jit_avx512_I8
188/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
188                           Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.000434cpu: 0:00:00.000434 execType: jit_avx512_I8
Convolution_552               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.022081cpu: 0:00:00.022081 execType: jit_avx512_I8
Convolution_601/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_601               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.021840cpu: 0:00:00.021840 execType: jit_avx512_I8
194/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
194                           Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.000407cpu: 0:00:00.000407 execType: jit_avx512_I8
Convolution_650               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.044589cpu: 0:00:00.044589 execType: jit_avx512_I8
Convolution_698/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_698               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.014533cpu: 0:00:00.014533 execType: jit_avx512_1x1_I8
197/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
197                           Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.000381cpu: 0:00:00.000381 execType: ref_I8    
Convolution_747               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.007368cpu: 0:00:00.007368 execType: jit_avx512_1x1_I8
Convolution_796/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_796               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.010890cpu: 0:00:00.010890 execType: jit_avx512_I8
220/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
220                           Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.000626cpu: 0:00:00.000626 execType: jit_avx512_FP32
221/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
221                           Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.000552cpu: 0:00:00.000552 execType: ref_I8    
Convolution_911               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.007821cpu: 0:00:00.007821 execType: jit_avx512_1x1_I8
Convolution_960/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_960               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.010925cpu: 0:00:00.010925 execType: jit_avx512_I8
244/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
244                           Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.001236cpu: 0:00:00.001236 execType: jit_avx512_FP32
245/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
245                           Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.001248cpu: 0:00:00.001248 execType: ref_I8    
Convolution_1075              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.007629cpu: 0:00:00.007629 execType: jit_avx512_1x1_I8
Convolution_1124/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1124              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.011312cpu: 0:00:00.011312 execType: jit_avx512_I8
268/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
268                           Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.002867cpu: 0:00:00.002867 execType: jit_avx512_FP32
269/fq_input_0                Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
269                           Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.002342cpu: 0:00:00.002342 execType: ref_I8    
Convolution_1239              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.008164cpu: 0:00:00.008164 execType: jit_avx512_1x1_I8
Convolution_1288/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1288              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.011128cpu: 0:00:00.011128 execType: jit_avx512_I8
feature_original              Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1337/fq_input_0   Status.EXECUTEDlayerType: FakeQuantize       realTime: 0:00:00.001518cpu: 0:00:00.001518 execType: jit_avx512_FP32
Convolution_1337              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.005577cpu: 0:00:00.005577 execType: jit_avx512_I8
Convolution_1386/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1386              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.005581cpu: 0:00:00.005581 execType: jit_avx512_I8
Convolution_1435/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1435              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.003140cpu: 0:00:00.003140 execType: jit_avx512_I8
Convolution_1484/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1484              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.000503cpu: 0:00:00.000503 execType: jit_avx512_1x1_I8
Convolution_1533/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1533              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.000378cpu: 0:00:00.000378 execType: jit_avx512_1x1_I8
Multiply_4787                 Status.NOT_RUN layerType: Multiply           realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1533_acdb_abcd... Status.NOT_RUN layerType: Reorder            realTime: 0:00:00   cpu: 0:00:00        execType: reorder_FP32
y/sink_port_0                 Status.NOT_RUN layerType: Result             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
feature                       Status.EXECUTEDlayerType: Multiply           realTime: 0:00:00.001720cpu: 0:00:00.001720 execType: jit_avx512_FP32
feature_acdb_abcd_feature/... Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.002623cpu: 0:00:00.002623 execType: reorder_FP32
feature/sink_port_0           Status.NOT_RUN layerType: Result             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
Total time:     0:00:00.851315 microseconds
Total CPU time: 0:00:00.851315 microseconds
Count:          70 iterations
Duration:       60328.67 ms
Latency:
    Median:     854.99 ms
    AVG:        845.12 ms
    MIN:        603.93 ms
    MAX:        1077.84 ms
Throughput: 1.17 FPS

jgespino commented 2 years ago

@Grigor355

Meanwhile, my input size is [1, 3, 1120, 1120] (I believe I missed this detail), what was your input size? As the model is fully convolutional, it can work with different input sizes.

The input shape was set to [1, 3, 1120, 1120].

Can you test on both systems using the same OpenVINO version? Looks like your i7-8700K system has:

         openvino_intel_cpu_plugin version 2022.2
         Build................... custom_master_c519aff42f144f12b65340e02f5d303411779634

and the Intel® Xeon® Gold 6138 system has:

         openvino_intel_cpu_plugin version 2022.1
         Build................... 2022.1.0-7019-cdb9bec7210-releases/2022/1

You are seeing ~2.5 x faster inference when switching to quantized model on Intel(r) Xeon(r) Platinum 8368, this CPU has 32 cores and VNNI, whereas mine has 20 cores and no VNNI. Can it be the reason of performance difference?

I am comparing the quantized model on Intel(R) Core(TM) i7-8665U and Intel(r) Xeon(r) Platinum 8368. Correct, the XEON processor that I am using has additional cores and instruction sets. However, I would still expect a little more performance when comparing your two systems. When I tested on the Intel Core i5-1135G7 with avx512 and a Intel Core i7-8665U with AVX2 I saw about 3x the FPS.

Grigor355 commented 2 years ago

Here is the benchmark output of AVX2 on the quantized model with the same version of openvino as in Xeon Gold env, results are the same.

[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device CPU performance hint will be set to LATENCY.
[ INFO ] OpenVINO:
         API version............. 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.1
         Build................... 2022.1.0-7019-cdb9bec7210-releases/2022/1

[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 20.02 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'x' precision u8, dimensions ([N,C,H,W]): 1 3 1120 1120
[ INFO ] Model output 'y' precision f32, dimensions ([...]): 1 560 560 2
[ INFO ] Model output 'feature' precision f32, dimensions ([...]): 1 32 560 560
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 67.89 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 12)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  , 
[ INFO ]   NUM_STREAMS  , 1
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , True
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 1 infer requests took 0.09 ms
[ WARNING ] No input files were given for input 'x'!. This input will be filled with random values!
[ INFO ] Fill input 'x' with random values 
[Step 10/11] Measuring performance (Start inference synchronously, inference only: True, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 785.51 ms
[Step 11/11] Dumping statistics report
[ INFO ] Performance counts for 0-th infer request
x                             Status.NOT_RUN layerType: Parameter          realTime: 0:00:00   cpu: 0:00:00        execType: unknown_I8
Convolution_58/fq_input_0     Status.EXECUTEDlayerType: FakeQuantize       realTime: 0:00:00.000370cpu: 0:00:00.000370 execType: jit_avx2_I8
Convolution_58/fq_input_0_... Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.000291cpu: 0:00:00.000291 execType: jit_uni_I8
Convolution_58                Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.007507cpu: 0:00:00.007507 execType: jit_avx2_I8
Convolution_107/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_107               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.073974cpu: 0:00:00.073974 execType: jit_avx2_I8
input.16/fq_input_0           Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.16                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.003689cpu: 0:00:00.003689 execType: jit_avx2_I8
Convolution_157               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.037036cpu: 0:00:00.037036 execType: jit_avx2_I8
Convolution_206/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_206               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.080682cpu: 0:00:00.080682 execType: jit_avx2_I8
input.176/fq_input_1          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.36                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.001753cpu: 0:00:00.001753 execType: jit_avx2_I8
Convolution_256               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.040458cpu: 0:00:00.040458 execType: jit_avx2_I8
Convolution_305/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_305               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.082231cpu: 0:00:00.082231 execType: jit_avx2_I8
Convolution_354/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_354               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.081159cpu: 0:00:00.081159 execType: jit_avx2_I8
input.64/fq_input_0           Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.64                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.000879cpu: 0:00:00.000879 execType: jit_avx2_I8
Convolution_404               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.040755cpu: 0:00:00.040755 execType: jit_avx2_I8
Convolution_453/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_453               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.084986cpu: 0:00:00.084986 execType: jit_avx2_I8
Convolution_502/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_502               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.085592cpu: 0:00:00.085592 execType: jit_avx2_I8
input.92/fq_input_0           Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.92                      Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.000451cpu: 0:00:00.000451 execType: jit_avx2_I8
Convolution_552               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.021443cpu: 0:00:00.021443 execType: jit_avx2_I8
Convolution_601/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_601               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.022198cpu: 0:00:00.022198 execType: jit_avx2_I8
input.108/fq_input_0          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.108                     Status.EXECUTEDlayerType: MaxPool            realTime: 0:00:00.000188cpu: 0:00:00.000188 execType: jit_avx2_I8
Convolution_650               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.044498cpu: 0:00:00.044498 execType: jit_avx2_I8
Convolution_698/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_698               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.008323cpu: 0:00:00.008323 execType: jit_avx2_1x1_I8
input.116/fq_input_0          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.116                     Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.000318cpu: 0:00:00.000318 execType: ref_I8    
Convolution_747               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.006679cpu: 0:00:00.006679 execType: jit_avx2_1x1_I8
Convolution_796/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_796               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.011028cpu: 0:00:00.011028 execType: jit_avx2_I8
onnx::Concat_220/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
onnx::Concat_220              Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.000465cpu: 0:00:00.000465 execType: jit_avx2_FP32
input.136/fq_input_0          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.136                     Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.001004cpu: 0:00:00.001004 execType: ref_I8    
Convolution_911               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.006482cpu: 0:00:00.006482 execType: jit_avx2_1x1_I8
Convolution_960/fq_input_0    Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_960               Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.010767cpu: 0:00:00.010767 execType: jit_avx2_I8
onnx::Concat_244/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
onnx::Concat_244              Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.000979cpu: 0:00:00.000979 execType: jit_avx2_FP32
input.156/fq_input_0          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.156                     Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.002396cpu: 0:00:00.002396 execType: ref_I8    
Convolution_1075              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.006498cpu: 0:00:00.006498 execType: jit_avx2_1x1_I8
Convolution_1124/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1124              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.010255cpu: 0:00:00.010255 execType: jit_avx2_I8
onnx::Concat_268/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
onnx::Concat_268              Status.EXECUTEDlayerType: Interpolate        realTime: 0:00:00.002144cpu: 0:00:00.002144 execType: jit_avx2_FP32
input.176/fq_input_0          Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
input.176                     Status.EXECUTEDlayerType: Concat             realTime: 0:00:00.005118cpu: 0:00:00.005118 execType: ref_I8    
Convolution_1239              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.006800cpu: 0:00:00.006800 execType: jit_avx2_1x1_I8
Convolution_1288/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1288              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.008933cpu: 0:00:00.008933 execType: jit_avx2_I8
feature_original              Status.NOT_RUN layerType: Relu               realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1337/fq_input_0   Status.EXECUTEDlayerType: FakeQuantize       realTime: 0:00:00.003288cpu: 0:00:00.003288 execType: jit_avx2_FP32
Convolution_1337              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.004690cpu: 0:00:00.004690 execType: jit_avx2_I8
Convolution_1386/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1386              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.004766cpu: 0:00:00.004766 execType: jit_avx2_I8
Convolution_1435/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1435              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.002428cpu: 0:00:00.002428 execType: jit_avx2_I8
Convolution_1484/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1484              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.000352cpu: 0:00:00.000352 execType: jit_avx2_1x1_I8
Convolution_1533/fq_input_0   Status.NOT_RUN layerType: FakeQuantize       realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1533              Status.EXECUTEDlayerType: Convolution        realTime: 0:00:00.000159cpu: 0:00:00.000159 execType: jit_avx2_1x1_I8
Multiply_4660                 Status.NOT_RUN layerType: Multiply           realTime: 0:00:00   cpu: 0:00:00        execType: undef     
Convolution_1533_acdb_abcd... Status.NOT_RUN layerType: Reorder            realTime: 0:00:00   cpu: 0:00:00        execType: reorder_FP32
y/sink_port_0                 Status.NOT_RUN layerType: Result             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
feature                       Status.EXECUTEDlayerType: Multiply           realTime: 0:00:00.003381cpu: 0:00:00.003381 execType: jit_avx2_FP32
feature_acdb_abcd_feature/... Status.EXECUTEDlayerType: Reorder            realTime: 0:00:00.003635cpu: 0:00:00.003635 execType: reorder_FP32
feature/sink_port_0           Status.NOT_RUN layerType: Result             realTime: 0:00:00   cpu: 0:00:00        execType: unknown_FP32
Total time:     0:00:00.821028 microseconds
Total CPU time: 0:00:00.821028 microseconds

Count:          73 iterations
Duration:       60386.61 ms
Latency:
    Median:     821.97 ms
    AVG:        821.85 ms
    MIN:        779.08 ms
    MAX:        834.68 ms
Throughput: 1.22 FPS

jgespino commented 2 years ago

@Grigor355 Apologies, I missed the notification from your response. Where you able to obtain better performance with your setup?

openvinotoolkit / openvino

AVX512 is not faster than AVX2 for quantized model #11710