openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.24k stars 2.26k forks source link

Recommended steps to convert PyTorch model with upscale/interpolate layer #483

Closed JulienMaille closed 3 years ago

JulienMaille commented 4 years ago

Hello, I'm trying to find out what is the best way to convert my model to OpenCV and cover old CPUs (with default OpenCV backend taking a onnx) and more recent Intel CPUs (with IE backend taking bin/xml)

This model trained with qubvel library has an interpolate layer in the decoder https://github.com/qubvel/segmentation_models.pytorch/blob/master/segmentation_models_pytorch/unet/decoder.py#L36

This layer seems to be poorly handled when exported to onnx by PyTorch (despite recent commits) https://github.com/pytorch/pytorch/issues/27376 It will generate an Upscale layer when exported with opset=9, or a Resize layer with opset=10,11

It seems latest 2020.2 can handle the Resize layer with opset 10 (but not 11) but I'm right now trying to work with OpenVino 2019 R3 (because I would like to keep compatibility with non AVX CPUs) https://github.com/openvinotoolkit/openvino/issues/387

What's the recommended way, shall I skip the onnx step? I've seen recommendation to convert to a tensorflow model instead.

dkurt commented 4 years ago

@JulienMaille, can you please try to run the model with OpenCV and DNN_BACKEND_INFERENCE_ENGINE?

Can you please add more details? Which version of OpenVINO is used? Have you tried to pass raw .onnx model with DNN_BACKEND_INFERENCE_ENGINE using the latest OpenCV master branch? There were multiple patches by @ashishkrshrivastava with Resize layer import improvements.

JulienMaille commented 4 years ago

Hello @dkurt , I've mentioned my OpenVino version (2019 R3). Are you telling me that OpenCV can feed onnx to DNN_BACKEND_INFERENCE_ENGINE?! What are the differences with my current worflow involving onnx to .bin conversion with model optimiser and readNetFromModelOptimizer()?

edit: I just tried and it crashes in InfEngineBackendNet::connect()

dkurt commented 4 years ago

@JulienMaille, you need to try the latest OpenVINO (2020.2) and the latest OpenCV (4.3.0 or even master branch).

In case of .xml and .bin you use Model Optimizer to convert .onnx into Intermediate Representation.

Alternative solution is to pass .onnx into OpenCV directly and enable DNN_BACKEND_INFERENCE_ENGINE backend - it will build similar IR in runtime and use OpenVINO's Inference Engine where it's possible.

dkurt commented 4 years ago

It seems latest 2020.2 can handle the Resize layer with opset 10 (but not 11) but I'm right now trying to work with OpenVino 2019 R3 (because I would like to keep compatibility with non AVX CPUs) #387

Oh, I see you point. So Try to build the latest OpenCV but with OpenVINO 2019R3. Use -DINF_ENGINE_RELEASE=2019030000 flag to specify the version.

JulienMaille commented 4 years ago

I'm working with OpenCV origin/master pulled 2 days ago. I've not manually set INF_ENGINE_RELEASE but CMake already set it to 2019030000 automatically. Pseudo code

net = cv::dnn::readNetFromONNX("fileName.onnx");
net.setPreferableBackend(cv::dnn::DNN_BACKEND_INFERENCE_ENGINE);
net.setPreferableTarget(cv::dnn::DNN_TARGET_OPENCL);
net.setInput(cv::dnn::blobFromImage(...));
cv::Mat output = net.forward();
dkurt commented 4 years ago

I'm working with OpenCV origin/master pulled 2 days ago. I've not manually set INF_ENGINE_RELEASE but CMake already set it to 2019030000 automatically.

That is strange. There is no way to determine IE version for now automatically. Please try with cleaned build folder.

Pseudo code

What about experiment? Does it work or not?

JulienMaille commented 4 years ago

That is strange. There is no way to determine IE version for now automatically. Please try with cleaned build folder.

You are correct, maybe running setupvars.bat did it for me? cmake -G "Visual Studio 16 2019" -A "x64" -DBUILD_LIST=core,imgcodecs,imgproc,highgui,dnn -DBUILD_TESTS=OFF -DOPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules ../ -DWITH_INF_ENGINE=ON -DENABLE_CXX11=ON -DBUILD_opencv_world=ON -DWITH_IMGCODEC_PXM=OFF -DWITH_IMGCODEC_PFM=OFF -DWITH_IMGCODEC_SUNRASTER=OFF -DWITH_IMGCODEC_HDR=OFF -DWITH_JASPER=OFF -DWITH_TIFF=OFF -DWITH_FFMPEG=OFF -DWITH_WEBP=OFF -DWITH_DSHOW=OFF -DWITH_OPENEXR=OFF -DWITH_MSMF=OFF -DWITH_CUDA=OFF -DWITH_CUFFT=OFF -DWITH_CUBLAS=OFF -DWITH_QUIRC=OFF I get this

1>--   Other third-party libraries:
1>--     Intel IPP:                   2020.0.0 Gold [2020.0.0]
1>--            at:                   D:/Dev/opencv/build-nocuda/3rdparty/ippicv/ippicv_win/icv
1>--     Intel IPP IW:                sources (2020.0.0)
1>--               at:                D:/Dev/opencv/build-nocuda/3rdparty/ippicv/ippicv_win/iw
1>--     Lapack:                      NO
1>--     Inference Engine:            YES (2019030000 / 2.1.0)
1>--         * libs:                  C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/lib/intel64/Release/inference_engine_legacy.lib / C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/lib/intel64/Debug/inference_engine_legacyd.lib
1>--         * includes:              C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/include
1>--     nGraph:                      NO
1>--     Eigen:                       NO
1>--     Custom HAL:                  NO
1>--     Protobuf:                    build (3.5.1)

What about experiment? Does it work or not?

No the pseudo code crashes in InfEngineBackendNet::connect().

dkurt commented 4 years ago

@JulienMaille, can you check if DNN_BACKEND_OPENCV works? Is that possible to share a model to test locally?

JulienMaille commented 4 years ago

Yes , it works with

net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);

Model is similar to the one I sent on your email when we discussed the resized input issue.

JulienMaille commented 4 years ago

@dkurt I recompiled OpenCV with openvino 2020.2 and tried loading the onnx and then using DNN_BACKEND_INFERENCE_ENGINE It does work, however I get significantly worse performance than using the openvino optimizer and loading my bin/xml. 700ms instead of 500ms, Is it to be expected?

dkurt commented 4 years ago

@JulienMaille, but does bin/xml produce correct results?

JulienMaille commented 4 years ago

Yes

JulienMaille commented 4 years ago

Ok just to clarify, the slower performance commes from OpenCV+2020.2 (so from NGraph?), not from the onnx vs bin/xml

OpenCV 2019 R3 - bin/xml: 65ms
               - onnx   : crash
OpenCV 2020.2  - bin/xml: 93ms
               - onnx   : 93ms

Forcing cv::dnn::setInferenceEngineBackendType("NN_BUILDER"); with 2020.2 results in crashes in both case(onnx, bin/xml)

JulienMaille commented 4 years ago

@dkurt let me know if there's anything I shall try to gather more information. Do you confirm that;

  1. feeding onnx to inference engine requires ngraph? (ie. openvino 2020)
  2. ngraph is known to be slower in some case?
  3. NN_BUILDER not supposed to work with 2020?
JulienMaille commented 4 years ago

Some numbers to illustrated performance degradations on an Intel Xeon E5-1620/Win10: resnet18 based onnx:

2020.2
avg: 90.9, min: 87, median: 91, std: 2.64 -> 32% slower
avg: 90.9, min: 88, median: 91, std: 3.20
avg: 91.0, min: 90, median: 91, std: 2.62

2019.R3
avg: 71.0, min: 67, median: 69, std: 7.81
avg: 70.9, min: 66, median: 69, std: 8.27
avg: 70.9, min: 63, median: 69, std: 8.25

EfficentNet-b4 based onnx:

2020.2
avg: 107.8, min: 104, median: 108, std: 2.25 -> 3% slower
avg: 107.1, min: 106, median: 107, std: 2.42
avg: 109.4, min: 106, median: 109, std: 2.07

2019.R3
avg: 105.4, min: 102, median: 103, std: 8.01
avg: 106.1, min: 100, median: 104, std: 8.84
avg: 105.7, min:  99, median: 104, std: 7.82
JulienMaille commented 4 years ago

@dkurt looks like you can't reproduce the slowdown, at least on one model, here is the output of opencv_version_win32.exe

2020.2 - General configuration for OpenCV 4.3.0-dev =====================================

``` Version control: 4.3.0-290-g593af7287b Extra modules: Location (extra): D:/Dev/opencv_contrib/modules Version control (extra): 4.3.0-23-g6d855748 Platform: Timestamp: 2020-05-13T12:52:40Z Host: Windows 10.0.18363 AMD64 CMake: 3.16.19112601-MSVC_2 CMake generator: Visual Studio 16 2019 CMake build tool: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe MSVC: 1925 CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (13 files): + SSSE3 SSE4_1 SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (0 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (4 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (25 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX512_SKX (3 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX C/C++: Built as dynamic libs?: YES C++ standard: 11 C++ Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.25.28610/bin/Hostx64/x64/cl.exe (ver 19.25.28614.0) C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MD /O2 /Ob2 /DNDEBUG C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MDd /Zi /Ob0 /Od /RTC1 C Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.25.28610/bin/Hostx64/x64/cl.exe C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MD /O2 /Ob2 /DNDEBUG C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MDd /Zi /Ob0 /Od /RTC1 Linker flags (Release): /machine:x64 /INCREMENTAL:NO Linker flags (Debug): /machine:x64 /debug /INCREMENTAL ccache: NO Precompiled headers: NO Extra dependencies: 3rdparty dependencies: OpenCV modules: To be built: core dnn highgui imgcodecs imgproc photo world xphoto Disabled: - Disabled by dependency: aruco bgsegm bioinspired calib3d ccalib datasets dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs img_hash intensity_transform java_bindings_generator line_descriptor ml objdetect optflow phase_unwrapping plot python_bindings_generator python_tests quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab xfeatures2d ximgproc xobjdetect Unavailable: alphamat cnn_3dobj cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv freetype hdf java js matlab ovis python2 python3 sfm viz Applications: apps Documentation: NO Non-free algorithms: NO Windows RT support: NO GUI: Win32 UI: YES VTK support: NO Media I/O: ZLib: build (ver 1.2.11) JPEG: build-libjpeg-turbo (ver 2.0.4-62) PNG: build (ver 1.6.37) JPEG 2000: NO HDR: NO SUNRASTER: NO PXM: NO PFM: NO Video I/O: DC1394: NO GStreamer: NO Parallel framework: Concurrency Trace: YES (with Intel ITT) Other third-party libraries: Intel IPP: 2020.0.0 Gold [2020.0.0] at: D:/Dev/opencv/build-nocuda-2020-2/3rdparty/ippicv/ippicv_win/icv Intel IPP IW: sources (2020.0.0) at: D:/Dev/opencv/build-nocuda-2020-2/3rdparty/ippicv/ippicv_win/iw Lapack: NO Inference Engine: YES (2020020000 / 2.1.0) * libs: C:/Program Files (x86)/IntelSWTools/openvino_2020.2.117/deployment_tools/inference_engine/lib/intel64/Release/inference_engine_legacy.lib / C:/Program Files (x86)/IntelSWTools/openvino_2020.2.117/deployment_tools/inference_engine/lib/intel64/Debug/inference_engine_legacyd.lib * includes: C:/Program Files (x86)/IntelSWTools/openvino_2020.2.117/deployment_tools/inference_engine/include nGraph: YES (1.1.1+) * libs: C:/Program Files (x86)/IntelSWTools/openvino_2020.2.117/deployment_tools/ngraph/lib/ngraph.dll * includes: C:/Program Files (x86)/IntelSWTools/openvino_2020.2.117/deployment_tools/ngraph/include Eigen: NO Custom HAL: NO Protobuf: build (3.5.1) OpenCL: YES (NVD3D11) Include path: D:/Dev/opencv/3rdparty/include/opencl/1.2 Link libraries: Dynamic load Python (for build): C:/Users/julien.maille/AppData/Local/Programs/Python/Python37/python.exe Java: ant: NO JNI: NO Java wrappers: NO Java tests: NO Install to: D:/Dev/opencv/build-nocuda-2020-2/install ----------------------------------------------------------------- OpenCL Platforms: NVIDIA CUDA dGPU: GeForce GTX 960 (OpenCL 1.2 CUDA) Current OpenCL device: Type = dGPU Name = GeForce GTX 960 Version = OpenCL 1.2 CUDA Driver version = 445.87 Address bits = 64 Compute units = 8 Max work group size = 1024 Local memory size = 48 KB Max memory allocation size = 1 GB Double support = Yes Host unified memory = No Device extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics Has AMD Blas = No Has AMD Fft = No Preferred vector width char = 1 Preferred vector width short = 1 Preferred vector width int = 1 Preferred vector width long = 1 Preferred vector width float = 1 Preferred vector width double = 1 OpenCV's HW features list: ID= 1 (MMX) -> ON ID= 2 (SSE) -> ON ID= 3 (SSE2) -> ON ID= 4 (SSE3) -> ON ID= 5 (SSSE3) -> ON ID= 6 (SSE4.1) -> ON ID= 7 (SSE4.2) -> ON ID= 8 (POPCNT) -> ON ID= 9 (FP16) -> ON ID= 10 (AVX) -> ON ID= 11 (AVX2) -> ON ID= 12 (FMA3) -> ON Total available: 12 Parallel framework: ms-concurrency (nthreads=8) ```

2019 R3 - General configuration for OpenCV 4.3.0-dev =====================================

``` Version control: 4.3.0-201-gc722625f28 Extra modules: Location (extra): D:/Dev/opencv_contrib/modules Version control (extra): 4.3.0-23-g6d855748 Platform: Timestamp: 2019-12-17T14:47:39Z Host: Windows 10.0.18363 AMD64 CMake: 3.16.19112601-MSVC_2 CMake generator: Visual Studio 16 2019 CMake build tool: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe MSVC: 1925 CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (13 files): + SSSE3 SSE4_1 SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (0 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (4 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (25 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX512_SKX (3 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX C/C++: Built as dynamic libs?: YES C++ standard: 11 C++ Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.25.28610/bin/Hostx64/x64/cl.exe (ver 19.25.28614.0) C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MD /O2 /Ob2 /DNDEBUG C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MDd /Zi /Ob0 /Od /RTC1 C Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.25.28610/bin/Hostx64/x64/cl.exe C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MD /O2 /Ob2 /DNDEBUG C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MDd /Zi /Ob0 /Od /RTC1 Linker flags (Release): /machine:x64 /INCREMENTAL:NO Linker flags (Debug): /machine:x64 /debug /INCREMENTAL ccache: NO Precompiled headers: NO Extra dependencies: 3rdparty dependencies: OpenCV modules: To be built: core dnn highgui imgcodecs imgproc photo world xphoto Disabled: - Disabled by dependency: aruco bgsegm bioinspired calib3d ccalib datasets dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs img_hash intensity_transform java_bindings_generator line_descriptor ml objdetect optflow phase_unwrapping plot python_bindings_generator python_tests quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab xfeatures2d ximgproc xobjdetect Unavailable: alphamat cnn_3dobj cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv freetype hdf java js matlab ovis python2 python3 sfm viz Applications: apps Documentation: NO Non-free algorithms: NO Windows RT support: NO GUI: Win32 UI: YES VTK support: NO Media I/O: ZLib: build (ver 1.2.11) JPEG: build-libjpeg-turbo (ver 2.0.4-62) PNG: build (ver 1.6.37) JPEG 2000: NO HDR: NO SUNRASTER: NO PXM: NO PFM: NO Video I/O: DC1394: NO GStreamer: NO Parallel framework: Concurrency Trace: YES (with Intel ITT) Other third-party libraries: Intel IPP: 2020.0.0 Gold [2020.0.0] at: D:/Dev/opencv/build-nocuda/3rdparty/ippicv/ippicv_win/icv Intel IPP IW: sources (2020.0.0) at: D:/Dev/opencv/build-nocuda/3rdparty/ippicv/ippicv_win/iw Lapack: NO Inference Engine: YES (2019030000 / 2.1.0) * libs: C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/lib/intel64/Release/inference_engine.lib / C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/lib/intel64/Debug/inference_engined.lib * includes: C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine/include nGraph: NO Eigen: NO Custom HAL: NO Protobuf: build (3.5.1) OpenCL: YES (NVD3D11) Include path: D:/Dev/opencv/3rdparty/include/opencl/1.2 Link libraries: Dynamic load Python (for build): C:/Users/julien.maille/AppData/Local/Programs/Python/Python37/python.exe Java: ant: NO JNI: NO Java wrappers: NO Java tests: NO Install to: D:/Dev/opencv/build-nocuda/install ----------------------------------------------------------------- OpenCL Platforms: NVIDIA CUDA dGPU: GeForce GTX 960 (OpenCL 1.2 CUDA) Current OpenCL device: Type = dGPU Name = GeForce GTX 960 Version = OpenCL 1.2 CUDA Driver version = 445.87 Address bits = 64 Compute units = 8 Max work group size = 1024 Local memory size = 48 KB Max memory allocation size = 1 GB Double support = Yes Host unified memory = No Device extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics Has AMD Blas = No Has AMD Fft = No Preferred vector width char = 1 Preferred vector width short = 1 Preferred vector width int = 1 Preferred vector width long = 1 Preferred vector width float = 1 Preferred vector width double = 1 OpenCV's HW features list: ID= 1 (MMX) -> ON ID= 2 (SSE) -> ON ID= 3 (SSE2) -> ON ID= 4 (SSE3) -> ON ID= 5 (SSSE3) -> ON ID= 6 (SSE4.1) -> ON ID= 7 (SSE4.2) -> ON ID= 8 (POPCNT) -> ON ID= 9 (FP16) -> ON ID= 10 (AVX) -> ON ID= 11 (AVX2) -> ON ID= 12 (FMA3) -> ON Total available: 12 Parallel framework: ms-concurrency (nthreads=8) ```

JulienMaille commented 4 years ago

More benchmarks, this time on a Microsoft surface/Win10. Everything is faster with 2020, except the Resnet on CPU

resnet18:

2020.2
GPU done in avg:  36.5, min:  36, median:  36, std:  0.68  -> 47% faster
CPU done in avg: 127.1, min: 101, median: 124, std: 18.74  -> 14% slower <<===
2019.R3
GPU done in avg:  68.0, min:  67, median:  68, std:  1.12
CPU done in avg: 115.5, min: 106, median: 109, std: 12.02

EfficentNet-b4:

2020.2
GPU done in avg:  42.7, min:  42, median:  42, std:  1.02  -> 57% faster
CPU done in avg: 125.0, min: 113, median: 122, std: 10.39  -> 10% faster
2019.R3
GPU done in avg:  99.1, min:  97, median:  98, std: 2.69
CPU done in avg: 138.6, min: 134, median: 135, std: 9.02