Closed JulienMaille closed 3 years ago
@JulienMaille, can you please try to run the model with OpenCV and DNN_BACKEND_INFERENCE_ENGINE
Can you please add more details? Which version of OpenVINO is used? Have you tried to pass raw .onnx
using the latest OpenCV master branch? There were multiple patches by @ashishkrshrivastava with Resize layer import improvements.
Hello @dkurt , I've mentioned my OpenVino version (2019 R3).
Are you telling me that OpenCV can feed onnx to DNN_BACKEND_INFERENCE_ENGINE
?! What are the differences with my current worflow involving onnx to .bin conversion with model optimiser and readNetFromModelOptimizer()
edit: I just tried and it crashes in InfEngineBackendNet::connect()
@JulienMaille, you need to try the latest OpenVINO (2020.2) and the latest OpenCV (4.3.0 or even master branch).
In case of .xml
and .bin
you use Model Optimizer to convert .onnx
into Intermediate Representation.
Alternative solution is to pass .onnx
into OpenCV directly and enable DNN_BACKEND_INFERENCE_ENGINE
backend - it will build similar IR in runtime and use OpenVINO's Inference Engine where it's possible.
It seems latest 2020.2 can handle the Resize layer with opset 10 (but not 11) but I'm right now trying to work with OpenVino 2019 R3 (because I would like to keep compatibility with non AVX CPUs) #387
Oh, I see you point. So Try to build the latest OpenCV but with OpenVINO 2019R3. Use -DINF_ENGINE_RELEASE=2019030000
flag to specify the version.
I'm working with OpenCV origin/master pulled 2 days ago. I've not manually set INF_ENGINE_RELEASE
but CMake already set it to 2019030000 automatically.
Pseudo code
net = cv::dnn::readNetFromONNX("fileName.onnx");
cv::Mat output = net.forward();
I'm working with OpenCV origin/master pulled 2 days ago. I've not manually set INF_ENGINE_RELEASE but CMake already set it to 2019030000 automatically.
That is strange. There is no way to determine IE version for now automatically. Please try with cleaned build folder.
Pseudo code
What about experiment? Does it work or not?
That is strange. There is no way to determine IE version for now automatically. Please try with cleaned build folder.
You are correct, maybe running setupvars.bat
did it for me?
I get this
1>-- Other third-party libraries:
1>-- Intel IPP: 2020.0.0 Gold [2020.0.0]
1>-- at: D:/Dev/opencv/build-nocuda/3rdparty/ippicv/ippicv_win/icv
1>-- Intel IPP IW: sources (2020.0.0)
1>-- at: D:/Dev/opencv/build-nocuda/3rdparty/ippicv/ippicv_win/iw
1>-- Lapack: NO
1>-- Inference Engine: YES (2019030000 / 2.1.0)
1>-- * libs: C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/lib/intel64/Release/inference_engine_legacy.lib / C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/lib/intel64/Debug/inference_engine_legacyd.lib
1>-- * includes: C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/include
1>-- nGraph: NO
1>-- Eigen: NO
1>-- Custom HAL: NO
1>-- Protobuf: build (3.5.1)
What about experiment? Does it work or not?
No the pseudo code crashes in InfEngineBackendNet::connect()
@JulienMaille, can you check if DNN_BACKEND_OPENCV
works? Is that possible to share a model to test locally?
Yes , it works with
Model is similar to the one I sent on your email when we discussed the resized input issue.
@dkurt I recompiled OpenCV with openvino 2020.2 and tried loading the onnx and then using DNN_BACKEND_INFERENCE_ENGINE
It does work, however I get significantly worse performance than using the openvino optimizer and loading my bin/xml. 700ms instead of 500ms,
Is it to be expected?
@JulienMaille, but does bin/xml produce correct results?
Ok just to clarify, the slower performance commes from OpenCV+2020.2 (so from NGraph?), not from the onnx vs bin/xml
OpenCV 2019 R3 - bin/xml: 65ms
- onnx : crash
OpenCV 2020.2 - bin/xml: 93ms
- onnx : 93ms
Forcing cv::dnn::setInferenceEngineBackendType("NN_BUILDER");
with 2020.2 results in crashes in both case(onnx, bin/xml)
@dkurt let me know if there's anything I shall try to gather more information. Do you confirm that;
Some numbers to illustrated performance degradations on an Intel Xeon E5-1620/Win10: resnet18 based onnx:
avg: 90.9, min: 87, median: 91, std: 2.64 -> 32% slower
avg: 90.9, min: 88, median: 91, std: 3.20
avg: 91.0, min: 90, median: 91, std: 2.62
avg: 71.0, min: 67, median: 69, std: 7.81
avg: 70.9, min: 66, median: 69, std: 8.27
avg: 70.9, min: 63, median: 69, std: 8.25
EfficentNet-b4 based onnx:
avg: 107.8, min: 104, median: 108, std: 2.25 -> 3% slower
avg: 107.1, min: 106, median: 107, std: 2.42
avg: 109.4, min: 106, median: 109, std: 2.07
avg: 105.4, min: 102, median: 103, std: 8.01
avg: 106.1, min: 100, median: 104, std: 8.84
avg: 105.7, min: 99, median: 104, std: 7.82
@dkurt looks like you can't reproduce the slowdown, at least on one model, here is the output of opencv_version_win32.exe
``` Version control: 4.3.0-290-g593af7287b Extra modules: Location (extra): D:/Dev/opencv_contrib/modules Version control (extra): 4.3.0-23-g6d855748 Platform: Timestamp: 2020-05-13T12:52:40Z Host: Windows 10.0.18363 AMD64 CMake: 3.16.19112601-MSVC_2 CMake generator: Visual Studio 16 2019 CMake build tool: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe MSVC: 1925 CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (13 files): + SSSE3 SSE4_1 SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (0 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (4 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (25 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX512_SKX (3 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX C/C++: Built as dynamic libs?: YES C++ standard: 11 C++ Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.25.28610/bin/Hostx64/x64/cl.exe (ver 19.25.28614.0) C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MD /O2 /Ob2 /DNDEBUG C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MDd /Zi /Ob0 /Od /RTC1 C Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.25.28610/bin/Hostx64/x64/cl.exe C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MD /O2 /Ob2 /DNDEBUG C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MDd /Zi /Ob0 /Od /RTC1 Linker flags (Release): /machine:x64 /INCREMENTAL:NO Linker flags (Debug): /machine:x64 /debug /INCREMENTAL ccache: NO Precompiled headers: NO Extra dependencies: 3rdparty dependencies: OpenCV modules: To be built: core dnn highgui imgcodecs imgproc photo world xphoto Disabled: - Disabled by dependency: aruco bgsegm bioinspired calib3d ccalib datasets dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs img_hash intensity_transform java_bindings_generator line_descriptor ml objdetect optflow phase_unwrapping plot python_bindings_generator python_tests quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab xfeatures2d ximgproc xobjdetect Unavailable: alphamat cnn_3dobj cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv freetype hdf java js matlab ovis python2 python3 sfm viz Applications: apps Documentation: NO Non-free algorithms: NO Windows RT support: NO GUI: Win32 UI: YES VTK support: NO Media I/O: ZLib: build (ver 1.2.11) JPEG: build-libjpeg-turbo (ver 2.0.4-62) PNG: build (ver 1.6.37) JPEG 2000: NO HDR: NO SUNRASTER: NO PXM: NO PFM: NO Video I/O: DC1394: NO GStreamer: NO Parallel framework: Concurrency Trace: YES (with Intel ITT) Other third-party libraries: Intel IPP: 2020.0.0 Gold [2020.0.0] at: D:/Dev/opencv/build-nocuda-2020-2/3rdparty/ippicv/ippicv_win/icv Intel IPP IW: sources (2020.0.0) at: D:/Dev/opencv/build-nocuda-2020-2/3rdparty/ippicv/ippicv_win/iw Lapack: NO Inference Engine: YES (2020020000 / 2.1.0) * libs: C:/Program Files (x86)/IntelSWTools/openvino_2020.2.117/deployment_tools/inference_engine/lib/intel64/Release/inference_engine_legacy.lib / C:/Program Files (x86)/IntelSWTools/openvino_2020.2.117/deployment_tools/inference_engine/lib/intel64/Debug/inference_engine_legacyd.lib * includes: C:/Program Files (x86)/IntelSWTools/openvino_2020.2.117/deployment_tools/inference_engine/include nGraph: YES (1.1.1+) * libs: C:/Program Files (x86)/IntelSWTools/openvino_2020.2.117/deployment_tools/ngraph/lib/ngraph.dll * includes: C:/Program Files (x86)/IntelSWTools/openvino_2020.2.117/deployment_tools/ngraph/include Eigen: NO Custom HAL: NO Protobuf: build (3.5.1) OpenCL: YES (NVD3D11) Include path: D:/Dev/opencv/3rdparty/include/opencl/1.2 Link libraries: Dynamic load Python (for build): C:/Users/julien.maille/AppData/Local/Programs/Python/Python37/python.exe Java: ant: NO JNI: NO Java wrappers: NO Java tests: NO Install to: D:/Dev/opencv/build-nocuda-2020-2/install ----------------------------------------------------------------- OpenCL Platforms: NVIDIA CUDA dGPU: GeForce GTX 960 (OpenCL 1.2 CUDA) Current OpenCL device: Type = dGPU Name = GeForce GTX 960 Version = OpenCL 1.2 CUDA Driver version = 445.87 Address bits = 64 Compute units = 8 Max work group size = 1024 Local memory size = 48 KB Max memory allocation size = 1 GB Double support = Yes Host unified memory = No Device extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics Has AMD Blas = No Has AMD Fft = No Preferred vector width char = 1 Preferred vector width short = 1 Preferred vector width int = 1 Preferred vector width long = 1 Preferred vector width float = 1 Preferred vector width double = 1 OpenCV's HW features list: ID= 1 (MMX) -> ON ID= 2 (SSE) -> ON ID= 3 (SSE2) -> ON ID= 4 (SSE3) -> ON ID= 5 (SSSE3) -> ON ID= 6 (SSE4.1) -> ON ID= 7 (SSE4.2) -> ON ID= 8 (POPCNT) -> ON ID= 9 (FP16) -> ON ID= 10 (AVX) -> ON ID= 11 (AVX2) -> ON ID= 12 (FMA3) -> ON Total available: 12 Parallel framework: ms-concurrency (nthreads=8) ```
``` Version control: 4.3.0-201-gc722625f28 Extra modules: Location (extra): D:/Dev/opencv_contrib/modules Version control (extra): 4.3.0-23-g6d855748 Platform: Timestamp: 2019-12-17T14:47:39Z Host: Windows 10.0.18363 AMD64 CMake: 3.16.19112601-MSVC_2 CMake generator: Visual Studio 16 2019 CMake build tool: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe MSVC: 1925 CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (13 files): + SSSE3 SSE4_1 SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (0 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (4 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (25 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX512_SKX (3 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX C/C++: Built as dynamic libs?: YES C++ standard: 11 C++ Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.25.28610/bin/Hostx64/x64/cl.exe (ver 19.25.28614.0) C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MD /O2 /Ob2 /DNDEBUG C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MDd /Zi /Ob0 /Od /RTC1 C Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.25.28610/bin/Hostx64/x64/cl.exe C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MD /O2 /Ob2 /DNDEBUG C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MDd /Zi /Ob0 /Od /RTC1 Linker flags (Release): /machine:x64 /INCREMENTAL:NO Linker flags (Debug): /machine:x64 /debug /INCREMENTAL ccache: NO Precompiled headers: NO Extra dependencies: 3rdparty dependencies: OpenCV modules: To be built: core dnn highgui imgcodecs imgproc photo world xphoto Disabled: - Disabled by dependency: aruco bgsegm bioinspired calib3d ccalib datasets dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs img_hash intensity_transform java_bindings_generator line_descriptor ml objdetect optflow phase_unwrapping plot python_bindings_generator python_tests quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab xfeatures2d ximgproc xobjdetect Unavailable: alphamat cnn_3dobj cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv freetype hdf java js matlab ovis python2 python3 sfm viz Applications: apps Documentation: NO Non-free algorithms: NO Windows RT support: NO GUI: Win32 UI: YES VTK support: NO Media I/O: ZLib: build (ver 1.2.11) JPEG: build-libjpeg-turbo (ver 2.0.4-62) PNG: build (ver 1.6.37) JPEG 2000: NO HDR: NO SUNRASTER: NO PXM: NO PFM: NO Video I/O: DC1394: NO GStreamer: NO Parallel framework: Concurrency Trace: YES (with Intel ITT) Other third-party libraries: Intel IPP: 2020.0.0 Gold [2020.0.0] at: D:/Dev/opencv/build-nocuda/3rdparty/ippicv/ippicv_win/icv Intel IPP IW: sources (2020.0.0) at: D:/Dev/opencv/build-nocuda/3rdparty/ippicv/ippicv_win/iw Lapack: NO Inference Engine: YES (2019030000 / 2.1.0) * libs: C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/lib/intel64/Release/inference_engine.lib / C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/lib/intel64/Debug/inference_engined.lib * includes: C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/include nGraph: NO Eigen: NO Custom HAL: NO Protobuf: build (3.5.1) OpenCL: YES (NVD3D11) Include path: D:/Dev/opencv/3rdparty/include/opencl/1.2 Link libraries: Dynamic load Python (for build): C:/Users/julien.maille/AppData/Local/Programs/Python/Python37/python.exe Java: ant: NO JNI: NO Java wrappers: NO Java tests: NO Install to: D:/Dev/opencv/build-nocuda/install ----------------------------------------------------------------- OpenCL Platforms: NVIDIA CUDA dGPU: GeForce GTX 960 (OpenCL 1.2 CUDA) Current OpenCL device: Type = dGPU Name = GeForce GTX 960 Version = OpenCL 1.2 CUDA Driver version = 445.87 Address bits = 64 Compute units = 8 Max work group size = 1024 Local memory size = 48 KB Max memory allocation size = 1 GB Double support = Yes Host unified memory = No Device extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics Has AMD Blas = No Has AMD Fft = No Preferred vector width char = 1 Preferred vector width short = 1 Preferred vector width int = 1 Preferred vector width long = 1 Preferred vector width float = 1 Preferred vector width double = 1 OpenCV's HW features list: ID= 1 (MMX) -> ON ID= 2 (SSE) -> ON ID= 3 (SSE2) -> ON ID= 4 (SSE3) -> ON ID= 5 (SSSE3) -> ON ID= 6 (SSE4.1) -> ON ID= 7 (SSE4.2) -> ON ID= 8 (POPCNT) -> ON ID= 9 (FP16) -> ON ID= 10 (AVX) -> ON ID= 11 (AVX2) -> ON ID= 12 (FMA3) -> ON Total available: 12 Parallel framework: ms-concurrency (nthreads=8) ```
More benchmarks, this time on a Microsoft surface/Win10. Everything is faster with 2020, except the Resnet on CPU
GPU done in avg: 36.5, min: 36, median: 36, std: 0.68 -> 47% faster
CPU done in avg: 127.1, min: 101, median: 124, std: 18.74 -> 14% slower <<===
GPU done in avg: 68.0, min: 67, median: 68, std: 1.12
CPU done in avg: 115.5, min: 106, median: 109, std: 12.02
GPU done in avg: 42.7, min: 42, median: 42, std: 1.02 -> 57% faster
CPU done in avg: 125.0, min: 113, median: 122, std: 10.39 -> 10% faster
GPU done in avg: 99.1, min: 97, median: 98, std: 2.69
CPU done in avg: 138.6, min: 134, median: 135, std: 9.02
Hello, I'm trying to find out what is the best way to convert my model to OpenCV and cover old CPUs (with default OpenCV backend taking a onnx) and more recent Intel CPUs (with IE backend taking bin/xml)
This model trained with qubvel library has an interpolate layer in the decoder
This layer seems to be poorly handled when exported to onnx by PyTorch (despite recent commits) It will generate an Upscale layer when exported with opset=9, or a Resize layer with opset=10,11
It seems latest 2020.2 can handle the Resize layer with opset 10 (but not 11) but I'm right now trying to work with OpenVino 2019 R3 (because I would like to keep compatibility with non AVX CPUs)
What's the recommended way, shall I skip the onnx step? I've seen recommendation to convert to a tensorflow model instead.