Closed nmVis closed 4 years ago
Hi, @nmVis! What is your version of OpenVINO?
Could you run your model with the example config from openvino? \deployment_tools\tools\post_training_optimization_toolkit\configs\examples\quantization\classificationmobilenetV2_tf_int8_simple_mode.json You only need to put paths to your xml and bin files.
Hi @dmitryte!
I'm using OpenVINO 2020.4.
I could do that one. However, I've already tried to use the same XML and .json files (just changed the dataset) for calibrating the OpenVINO's fp32 reidentification model I've mentioned earlier and the mobilenet model that is also mentioned earlier.
I got performance improvement for OpenVINO's fp32 reidentification model while I get no improvement for the mobilenet model so this should prove that xml and json files are not the problem.
Hi @dmitryte,
I suggest simplifying the POT config to the following one:
{
/* Model */
"model": {
"model_name": "mnet_v2", // Model name
"model": "mnetv2.vino.xml", // Path to model (.xml format)
"weights": "mnetv2.vino.bin" // Path to weights (.bin format)
},
/* Parameters of the engine used for model inference. */
/* Post-Training Optimization Tool supports engine based on accuracy checker and custom engine.
For custom engine you should specify your own set of parameters.
The engine based on accuracy checker uses accuracy checker parameters. You can specify the parameters
via accuracy checker config file or directly in engine section.
More information about accuracy checker parameters can be found here:
https://github.com/opencv/open_model_zoo/tree/master/tools/accuracy_checker */
"engine": {
//"type": "simplified", // OR default value "type": "accuracy_checker" for non simplified mode
"type": "accuracy_checker",
// you can specify path to directory with images
// also you can specify template for file names to filter images to load
// templates are unix style (This option valid only in simplified mode)
//"data_source": "D:/temp/BS/Data/BS/test/originals",
"config": "mnet_v2.yml",
},
/* Optimization hyperparameters */
"compression": {
"target_device": "CPU", // target device, the specificity of which will be taken into account during optimization
"algorithms": [
{
"name": "DefaultQuantization", // optimization algorithm name
"params": {
/* A preset is a collection of optimization algorithm parameters that will specify to the algorithm
to improve which metric the algorithm needs to concentrate. Each optimization algorithm supports
[performance, accuracy] presets */
"preset": "mixed",
"stat_subset_size": 200 // Size of subset to calculate activations statistics that can be used
// for quantization parameters calculation.
}
}
]
}
}
BTW, what CPU model do you use?
Hi @AlexKoff88 ,
The CPU I'm using is Intel i7-8750H CPU @ 2.20GHz.
@AlexKoff88 do you have any news regarding this one?
@nmVis, have you tried simplified configuration file from above?
I would even suggest using "preset" "performance" instead of "mixed", like here:
{
"model": {
"model_name": "mobilenetv2",
"model": "mnetv2.vino.xml",
"weights": "mnetv2.vino.bin"
},
"engine": {
"config": "mnet_v2.yml"
},
"compression": {
"target_device": "CPU",
"algorithms": [
{
"name": "DefaultQuantization",
"params": {
"preset": "performance",
"stat_subset_size": 300
}
}
]
}
}
Thanks for the answer @AlexKoff88 , I'll try it first thing tomorrow morning and let you know about the results.
@AlexKoff88 I'm sorry to be the bearer of bad news, but this one didn't show any speed improvements.
Any other ideas?
Then we should look at it on our side. @nmVis, can you please provide a MO command that you used to convert ONNX model and get OpenVINO IR?
The command I've used is:
python mo.py --input_model mobilenetv2-7.onnx
@nmVis , how do you measure performance? Using openvino benchmark tool?
@arfangeta hi!
No, I measure it using the google benchmark and measuring only the inference time of the network.
@nmVis Openvino has a tool for measuring the speed of inference, this is a openvino benchmark tool.(https://docs.openvinotoolkit.org/2020.4/_inference_engine_tools_benchmark_tool_README.html)
I took the nasnet-mobile model from the official onnx repository for classification (your first link is attached) and applied the default quantization. And I have such results: before quantization 1174.17 FPS, after quantization 3214.69 FPS. (2,7x performance) Please share the code using a google benchmark that reproduces your results.
nasnet-mobile is not the same as the model that I've provided the link for. I don't get what you wanted to accomplish with it. Could you explain?
Hi @nmVis!
I checked the provided onnx model with the sample pot configuration(Simple mode) from the above messages and see the following results:
FP32 Count: 125450 iterations Duration: 60009.38 ms Latency: 4.74 ms Throughput: 2090.51 FPS
INT8 Count: 186890 iterations Duration: 60004.92 ms Latency: 3.11 ms Throughput: 3114.58 FPS
Hi @dmitryte !
Interesting, I'm getting this kind of results with the sample pot configuration.
Async version:
FP32 Count: 15972 iterations Duration: 60014.64 ms Latency: 13.71 ms Throughput: 266.14 FPS
INT8 Count: 24380 iterations Duration: 60019.22 ms Latency: 9.43 ms Throughput: 406.20 FPS
Sync version:
FP32 Count: 7240 iterations Duration: 60006.54 ms Latency: 8.07 ms Throughput: 123.91 FPS
INT8 Count: 7470 iterations Duration: 60000.28 ms Latency: 7.65 ms Throughput: 130.67 FPS
As we can observe, the async version provides significant speedup but it isn't of much interest since the synchronous way is faster. Also, the difference in performance between synchronous FP32 and INT8 is negligible.
What mode did you run your tests in (-api
flag of the benchmark_app.py
)?
@dmitryte Hi! Any updates or comments?
Hi, @nmVis
I got the results above using default execution of the benchmark_app with async api.
The increased latency is expected with async API but you also get more FPS. Sync mode is more suitable for real-time apps since latency is critical in this case.
You can also tweak number of threads and infer requests with the benchmark_app. Though, defaults are usually optimal for most of the cases but we still recommend to play with these numbers and adopt for your specific case. You can check the following guide on the perfomance optimization - https://docs.openvinotoolkit.org/latest/openvino_docs_optimization_guide_dldt_optimization_guide.html
Hi @dmitryte,
since we have a real-time app on our end, the async mode isn't interesting to us. Do you have some idea what could cause the lack of performance increase in the case of the INT8 model in sync mode?
I'm expecting the INT8 model to be faster than the FP32 variant for the same number of threads. What could be the cause that it doesn't behave like that?
Hi @nmVis
I ran a benchmark one more time with your model and get the following results:
For int8 I get 3x speed up and lower latency.
Could you check your measurements one more time?
FP32 - SYNC Count: 13329 iterations Duration: 60003.86 ms Latency: 4.63 ms Throughput: 215.79 FPS
INT8 - SYNC Count: 41696 iterations Duration: 60000.94 ms Latency: 1.42 ms Throughput: 703.21 FPS
@dmitryte I've checked my measurements one more time and they're similar. Could it be a CPU related issue? What CPU do you test on? Maybe some of my colleagues have one so I could check on their PC if it is the case. Mine is i7 8750H if it helps you in some way.
Hmm, you still should get some improvement on 8th gen cause I've got i5 8th gen on my laptop and can see the speed up. Though , it's lower than in the benchmarks I posted last week due to unsupported AVX-512 instructions.
Starting with 10th gen you can have a support for DL Boost that makes int8 models run even faster. Same is supported on for the server hardware.
We may go in PM and discuss specifics of your real-time app if you don't mind.
Thanks man. Let's go to the PM.
Hi, I'm having problem with the quantization of Mobilenet V2 architecture. I'm expecting that quantization should improve the performance of Mobilenet V2 architecture. However, I don't get the expected result.
The onnx model I'm using is available at the following link which is from official ONNX repo.
After converting it using the
mo.py
script it runs at around 10 msAfter conversion using the following json and yml files:
mnet.json
mnet.yml
With command
pot -c mnet.json
I get a quantized model that runs at 10 ms just as fp32 model.However, the model OpenVINO is providing with Mobilenet v2 backbone is running at 10 ms for fp32 and at 7 ms for quantized model. Specifically, the model is available at the following link.
What could I be doing wrong?
Thanks in advance.
Regards, Nikola