Open ZJDATY opened 10 months ago
Please avoid posting of text information via screenshoots. It is ridiculous and blocks comparison of its output.
As first step of investigation we should align build flags first (CPU_BASELINE
at least).
opencv-openvino
and opencv-python
are different distribution channels.
So we need to compare packages from the same distribution channel (e.g. compare opencv-python==4.5.5
vs opencv-python==4.8.0
).
Please avoid posting of text information via screenshoots. It is ridiculous and blocks comparison of its output.
As first step of investigation we should align build flags first (
CPU_BASELINE
at least).opencv-openvino
andopencv-python
are different distribution channels. So we need to compare packages from the same distribution channel (e.g. compareopencv-python==4.5.5
vsopencv-python==4.8.0
).
I will upload the output log again and hope you can delete this tag. My output log contains the build version (CPU_BASELINE). OpenCV 4.8.0 and OpenCV 4.5.5-openvino are both officially built, and I think they can be compared. If the older version has a better and faster build method, why not choose it.
Hi @ZJDATY, I have tested your model with two different opencv-python
on my Mac intel i9 cpu.
And the result is the following: Test Python version=3.8
For opencv-python 4.5.5.64
the min time is forward time = 21.917058 ms
.
For opencv-python 4.8.0.74
the min time is forward time = 21.500704 ms
.
The two different versions takes almost the same time.
The test script is the following:
import cv2 as cv
import numpy as np
net = cv.dnn.readNet("yolov4/yolov4-tiny.cfg", "yolov4/yolov4-tiny.weights")
t = cv.TickMeter()
t.reset()
input = np.random.rand(1, 3, 416, 416)
net.setInput(input)
minT = 100000
for i in range(100):
t.start()
net.forward()
t.stop()
timeN = t.getTimeMilli()
if timeN < minT:
minT = timeN
t.reset()
print("forward time = ", minT, "ms")
Can you do the same test on you site?
Hi @ZJDATY, I have tested your model with two different
opencv-python
on my Mac intel i9 cpu.And the result is the following: Test Python version=3.8
For
opencv-python 4.5.5.64
the min time isforward time = 21.917058 ms
. Foropencv-python 4.8.0.74
the min time isforward time = 21.500704 ms
. The two different versions takes almost the same time.The test script is the following:
import cv2 as cv import numpy as np net = cv.dnn.readNet("yolov4/yolov4-tiny.cfg", "yolov4/yolov4-tiny.weights") t = cv.TickMeter() t.reset() input = np.random.rand(1, 3, 416, 416) net.setInput(input) minT = 100000 for i in range(100): t.start() net.forward() t.stop() timeN = t.getTimeMilli() if timeN < minT: minT = timeN t.reset() print("forward time = ", minT, "ms")
Can you do the same test on you site?
Hi@zihaomu ,I am happy to assist you in testing.I don't use Python very much, so I wrote a new one using C++. My environment is Win11, and the CPU is I7-9700. Here are my test results.
#include <opencv2/opencv.hpp>
int main()
{
cv::TickMeter *t = new cv::TickMeter();
t->reset();
auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
cv::Mat frame(416, 416, CV_32FC3), blob;
cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);
net.setInput(blob);
double minT = 100000;
for (int i = 0; i < 100; i++)
{
t->start();
net.forward();
t->stop();
double timeN = t->getAvgTimeMilli();
if (timeN < minT)
minT = timeN;
t->reset();
}
std::cout << "forward time = " << minT << std::endl;
return 0;
}
For opencv4.8.0 the min time is forward time = 30.2679 ms. 1st 24.5836 2st 25.1356 3st 25.9139 4st 25.0378 5st
For opencv4.5.5-openvino the min time is forward time = 10.6107 ms. 1st 10.4434 2st 10.3903 3st 10.3375 4st 10.4401 5st
@zihaomu How crazy! The inference time has increased by 150%. If you need it, I can provide my remote desktop for you to test.
@ZJDATY Can you test it by the opencv-python?
@ZJDATY Can you test it by the opencv-python?
Hi,@zihaomu , I will give it a try. I have a limited understanding of Python. But I want to know why Python testing is necessary. The actual deployment still requires C++. Shouldn't we use a production environment for testing. If the testing time in Python is similar, what can it say? Compiled opencv4.5.5 with openvino, did you use any special optimization measures?
Hi @ZJDATY, I just test it on my Windows 10, i7-9750. And indeed, I can not reproduce your result. opencv-python is compiled by OpenCV C++ source code.
For opencv-python 4.5.5.64 the min time of forward is time = 36.2916 ms. For opencv-python 4.6.0.66 the min time of forward is time = 34.8292 ms. For opencv-python 4.8.0.74 the min time of forward is time = 34.4077 ms.
Hi @ZJDATY, I just test it on my Windows 10, i7-9750. And indeed, I can not reproduce your result. opencv-python is compiled by OpenCV C++ source code.
For opencv-python 4.5.5.64 the min time of forward is time = 36.2916 ms. For opencv-python 4.6.0.66 the min time of forward is time = 34.8292 ms. For opencv-python 4.8.0.74 the min time of forward is time = 34.4077 ms.
So is it opencv455 openvino with powerful optimization measures? Can you use the C++program I wrote for testing?
Please make sure you are using the OpenCV DNN of CPU_backend, instead of openvino or otherwise. Comparing different backends is not fair.
Please make sure you are using the OpenCV DNN of CPU_backend, instead of openvino or otherwise. Comparing different backends is not fair.
I understand what you mean. If these two sentences are added, is the effect comparable. If you compile opencv480 openvino yourself. So the testing effect should be similar to opencv-455 openvino.
net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
Hi @zihaomu ,I have modified the program.
#include <opencv2/opencv.hpp>
int main()
{
cv::TickMeter *t = new cv::TickMeter();
t->reset();
auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
cv::Mat frame(416, 416, CV_32FC3), blob;
cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);
net.setInput(blob);
double minT = 100000;
for (int i = 0; i < 100; i++)
{
t->start();
net.forward();
t->stop();
double timeN = t->getAvgTimeMilli();
if (timeN < minT)
minT = timeN;
t->reset();
}
std::cout << "forward time = " << minT << std::endl;
delete t;
return 0;
}
Opencv455-openvino is 16.2936ms. Opencv480 is 26.1167ms. There is still a gap of nearly 10ms.
For comparative testing, I also compiled a version of opencv480 openvino myself. Using the same code above, its output is as follows:
forward time = 24.7357
forward time = 24.7381
This is opencv480 with openvino.
No idea
Have you compare Opencv455 without openvino with opencv480
Can you disable the Inference Engine
to rebuild and test it again?
I just recheck you issue cmake file. And found that 4.5 compiled with Inference Engine, and 4.8 without InferenEngine.
Have you compare Opencv455 without openvino with opencv480
I will.
Can you disable the
Inference Engine
to rebuild and test it again? I just recheck you issue cmake file. And found that 4.5 compiled with Inference Engine, and 4.8 without InferenEngine.
What I am showing is the officially compiled opencv455-openvino and opencv480, and I am afraid you may say that there is a problem with my compilation. In fact, I have also compiled several versions with openvino myself. Below, I have provided the compilation information, which is opencv480-openvino. Its time is similar to the time without openvino2022.1, but the time difference between it and opencv455 openvino is still significant.
General configuration for OpenCV 4.8.0 =====================================
Version control: unknown
Extra modules:
Location (extra): D:/opencv/opencv_contrib-4.8.0/modules
Version control (extra): unknown
Platform:
Timestamp: 2023-06-29T13:58:22Z
Host: Windows 10.0.22621 AMD64
CMake: 3.26.4
CMake generator: Visual Studio 16 2019
CMake build tool: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe
MSVC: 1929
Configuration: Debug Release
CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (16 files): + SSSE3 SSE4_1
SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (0 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (7 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (35 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
AVX512_SKX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe (ver 19.29.30148.0)
C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP -openmp /MD /O2 /Ob2 /DNDEBUG
C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP -openmp /MDd /Zi /Ob0 /Od /RTC1
C Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /MP -openmp /MD /O2 /Ob2 /DNDEBUG
C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /MP -openmp /MDd /Zi /Ob0 /Od /RTC1
Linker flags (Release): /machine:x64 delayimp.lib /DELAYLOAD:cublas64_11.dll /DELAYLOAD:cublasLt64_11.dll /DELAYLOAD:cudnn64_8.dll /DELAYLOAD:cudnn_adv_infer64_8.dll /DELAYLOAD:cudnn_adv_train64_8.dll /DELAYLOAD:cudnn_cnn_infer64_8.dll /DELAYLOAD:cudnn_cnn_train64_8.dll /DELAYLOAD:cudnn_ops_infer64_8.dll /DELAYLOAD:cudnn_ops_train64_8.dll /DELAYLOAD:cufft64_10.dll /DELAYLOAD:cufftw64_10.dll /DELAYLOAD:cuinj64_117.dll /DELAYLOAD:curand64_10.dll /DELAYLOAD:cusolver64_11.dll /DELAYLOAD:cusolverMg64_11.dll /DELAYLOAD:cusparse64_11.dll /DELAYLOAD:nppc64_11.dll /DELAYLOAD:nppial64_11.dll /DELAYLOAD:nppicc64_11.dll /DELAYLOAD:nppidei64_11.dll /DELAYLOAD:nppif64_11.dll /DELAYLOAD:nppig64_11.dll /DELAYLOAD:nppim64_11.dll /DELAYLOAD:nppist64_11.dll /DELAYLOAD:nppisu64_11.dll /DELAYLOAD:nppitc64_11.dll /DELAYLOAD:npps64_11.dll /DELAYLOAD:nvblas64_11.dll /DELAYLOAD:nvjpeg64_11.dll /DELAYLOAD:nvrtc-builtins64_117.dll /DELAYLOAD:nvrtc64_112_0.dll /DELAYLOAD:zlibwapi.dll /DELAYLOAD:nvcuda.dll /DELAYLOAD:nvml.dll /IGNORE:4199 /DELAYLOAD:opencv_cudev480.dll /DELAYLOAD:opencv_cudaarithm480.dll /DELAYLOAD:opencv_flann480.dll /DELAYLOAD:opencv_imgproc480.dll /DELAYLOAD:opencv_intensity_transform480.dll /DELAYLOAD:opencv_ml480.dll /DELAYLOAD:opencv_phase_unwrapping480.dll /DELAYLOAD:opencv_plot480.dll /DELAYLOAD:opencv_quality480.dll /DELAYLOAD:opencv_reg480.dll /DELAYLOAD:opencv_surface_matching480.dll /DELAYLOAD:opencv_cudafilters480.dll /DELAYLOAD:opencv_cudaimgproc480.dll /DELAYLOAD:opencv_cudawarping480.dll /DELAYLOAD:opencv_dnn480.dll /DELAYLOAD:opencv_dnn_superres480.dll /DELAYLOAD:opencv_features2d480.dll /DELAYLOAD:opencv_fuzzy480.dll /DELAYLOAD:opencv_hfs480.dll /DELAYLOAD:opencv_img_hash480.dll /DELAYLOAD:opencv_imgcodecs480.dll /DELAYLOAD:opencv_line_descriptor480.dll /DELAYLOAD:opencv_photo480.dll /DELAYLOAD:opencv_saliency480.dll /DELAYLOAD:opencv_text480.dll /DELAYLOAD:opencv_videoio480.dll /DELAYLOAD:opencv_xphoto480.dll /DELAYLOAD:opencv_calib3d480.dll /DELAYLOAD:opencv_cudacodec480.dll /DELAYLOAD:opencv_cudafeatures2d480.dll /DELAYLOAD:opencv_cudastereo480.dll /DELAYLOAD:opencv_datasets480.dll /DELAYLOAD:opencv_highgui480.dll /DELAYLOAD:opencv_mcc480.dll /DELAYLOAD:opencv_objdetect480.dll /DELAYLOAD:opencv_rapid480.dll /DELAYLOAD:opencv_rgbd480.dll /DELAYLOAD:opencv_shape480.dll /DELAYLOAD:opencv_structured_light480.dll /DELAYLOAD:opencv_ts480.dll /DELAYLOAD:opencv_video480.dll /DELAYLOAD:opencv_wechat_qrcode480.dll /DELAYLOAD:opencv_xfeatures2d480.dll /DELAYLOAD:opencv_ximgproc480.dll /DELAYLOAD:opencv_xobjdetect480.dll /DELAYLOAD:opencv_aruco480.dll /DELAYLOAD:opencv_bgsegm480.dll /DELAYLOAD:opencv_bioinspired480.dll /DELAYLOAD:opencv_ccalib480.dll /DELAYLOAD:opencv_cudabgsegm480.dll /DELAYLOAD:opencv_cudalegacy480.dll /DELAYLOAD:opencv_cudaobjdetect480.dll /DELAYLOAD:opencv_dnn_objdetect480.dll /DELAYLOAD:opencv_dpm480.dll /DELAYLOAD:opencv_face480.dll /DELAYLOAD:opencv_gapi480.dll /DELAYLOAD:opencv_optflow480.dll /DELAYLOAD:opencv_stitching480.dll /DELAYLOAD:opencv_tracking480.dll /DELAYLOAD:opencv_cudaoptflow480.dll /DELAYLOAD:opencv_stereo480.dll /DELAYLOAD:opencv_superres480.dll /DELAYLOAD:opencv_videostab480.dll /INCREMENTAL:NO
Linker flags (Debug): /machine:x64 delayimp.lib /DELAYLOAD:cublas64_11.dll /DELAYLOAD:cublasLt64_11.dll /DELAYLOAD:cudnn64_8.dll /DELAYLOAD:cudnn_adv_infer64_8.dll /DELAYLOAD:cudnn_adv_train64_8.dll /DELAYLOAD:cudnn_cnn_infer64_8.dll /DELAYLOAD:cudnn_cnn_train64_8.dll /DELAYLOAD:cudnn_ops_infer64_8.dll /DELAYLOAD:cudnn_ops_train64_8.dll /DELAYLOAD:cufft64_10.dll /DELAYLOAD:cufftw64_10.dll /DELAYLOAD:cuinj64_117.dll /DELAYLOAD:curand64_10.dll /DELAYLOAD:cusolver64_11.dll /DELAYLOAD:cusolverMg64_11.dll /DELAYLOAD:cusparse64_11.dll /DELAYLOAD:nppc64_11.dll /DELAYLOAD:nppial64_11.dll /DELAYLOAD:nppicc64_11.dll /DELAYLOAD:nppidei64_11.dll /DELAYLOAD:nppif64_11.dll /DELAYLOAD:nppig64_11.dll /DELAYLOAD:nppim64_11.dll /DELAYLOAD:nppist64_11.dll /DELAYLOAD:nppisu64_11.dll /DELAYLOAD:nppitc64_11.dll /DELAYLOAD:npps64_11.dll /DELAYLOAD:nvblas64_11.dll /DELAYLOAD:nvjpeg64_11.dll /DELAYLOAD:nvrtc-builtins64_117.dll /DELAYLOAD:nvrtc64_112_0.dll /DELAYLOAD:zlibwapi.dll /DELAYLOAD:nvcuda.dll /DELAYLOAD:nvml.dll /IGNORE:4199 /DELAYLOAD:opencv_cudev480.dll /DELAYLOAD:opencv_cudaarithm480.dll /DELAYLOAD:opencv_flann480.dll /DELAYLOAD:opencv_imgproc480.dll /DELAYLOAD:opencv_intensity_transform480.dll /DELAYLOAD:opencv_ml480.dll /DELAYLOAD:opencv_phase_unwrapping480.dll /DELAYLOAD:opencv_plot480.dll /DELAYLOAD:opencv_quality480.dll /DELAYLOAD:opencv_reg480.dll /DELAYLOAD:opencv_surface_matching480.dll /DELAYLOAD:opencv_cudafilters480.dll /DELAYLOAD:opencv_cudaimgproc480.dll /DELAYLOAD:opencv_cudawarping480.dll /DELAYLOAD:opencv_dnn480.dll /DELAYLOAD:opencv_dnn_superres480.dll /DELAYLOAD:opencv_features2d480.dll /DELAYLOAD:opencv_fuzzy480.dll /DELAYLOAD:opencv_hfs480.dll /DELAYLOAD:opencv_img_hash480.dll /DELAYLOAD:opencv_imgcodecs480.dll /DELAYLOAD:opencv_line_descriptor480.dll /DELAYLOAD:opencv_photo480.dll /DELAYLOAD:opencv_saliency480.dll /DELAYLOAD:opencv_text480.dll /DELAYLOAD:opencv_videoio480.dll /DELAYLOAD:opencv_xphoto480.dll /DELAYLOAD:opencv_calib3d480.dll /DELAYLOAD:opencv_cudacodec480.dll /DELAYLOAD:opencv_cudafeatures2d480.dll /DELAYLOAD:opencv_cudastereo480.dll /DELAYLOAD:opencv_datasets480.dll /DELAYLOAD:opencv_highgui480.dll /DELAYLOAD:opencv_mcc480.dll /DELAYLOAD:opencv_objdetect480.dll /DELAYLOAD:opencv_rapid480.dll /DELAYLOAD:opencv_rgbd480.dll /DELAYLOAD:opencv_shape480.dll /DELAYLOAD:opencv_structured_light480.dll /DELAYLOAD:opencv_ts480.dll /DELAYLOAD:opencv_video480.dll /DELAYLOAD:opencv_wechat_qrcode480.dll /DELAYLOAD:opencv_xfeatures2d480.dll /DELAYLOAD:opencv_ximgproc480.dll /DELAYLOAD:opencv_xobjdetect480.dll /DELAYLOAD:opencv_aruco480.dll /DELAYLOAD:opencv_bgsegm480.dll /DELAYLOAD:opencv_bioinspired480.dll /DELAYLOAD:opencv_ccalib480.dll /DELAYLOAD:opencv_cudabgsegm480.dll /DELAYLOAD:opencv_cudalegacy480.dll /DELAYLOAD:opencv_cudaobjdetect480.dll /DELAYLOAD:opencv_dnn_objdetect480.dll /DELAYLOAD:opencv_dpm480.dll /DELAYLOAD:opencv_face480.dll /DELAYLOAD:opencv_gapi480.dll /DELAYLOAD:opencv_optflow480.dll /DELAYLOAD:opencv_stitching480.dll /DELAYLOAD:opencv_tracking480.dll /DELAYLOAD:opencv_cudaoptflow480.dll /DELAYLOAD:opencv_stereo480.dll /DELAYLOAD:opencv_superres480.dll /DELAYLOAD:opencv_videostab480.dll /debug /INCREMENTAL
ccache: NO
Precompiled headers: YES
Extra dependencies: cudart_static.lib nppc.lib nppial.lib nppicc.lib nppidei.lib nppif.lib nppig.lib nppim.lib nppist.lib nppisu.lib nppitc.lib npps.lib cublas.lib cudnn.lib cufft.lib -LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7/lib/x64
3rdparty dependencies:
OpenCV modules:
To be built: aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
Disabled: java_bindings_generator js_bindings_generator python_bindings_generator python_tests world
Disabled by dependency: -
Unavailable: alphamat cvv freetype hdf java julia matlab ovis python2 python3 sfm viz
Applications: apps
Documentation: NO
Non-free algorithms: YES
Windows RT support: NO
GUI: WIN32UI
Win32 UI: YES
OpenGL support: YES (opengl32 glu32)
Media I/O:
ZLib: build (ver 1.2.13)
JPEG: build-libjpeg-turbo (ver 2.1.3-62)
SIMD Support Request: YES
SIMD Support: NO
WEBP: build (ver encoder: 0x020f)
PNG: build (ver 1.6.37)
TIFF: build (ver 42 - 4.2.0)
JPEG 2000: build (ver 2.5.0)
OpenEXR: build (ver 2.3.0)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES
Video I/O:
FFMPEG: YES (prebuilt binaries)
avcodec: YES (58.134.100)
avformat: YES (58.76.100)
avutil: YES (56.70.100)
swscale: YES (5.9.100)
avresample: YES (4.0.0)
DirectShow: YES
Media Foundation: YES
DXVA: YES
Parallel framework: TBB (ver 2020.2 interface 11102)
Trace: YES (with Intel ITT)
Other third-party libraries:
Intel IPP: 2021.8 [2021.8.0]
at: D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/icv
Intel IPP IW: sources (2021.8.0)
at: D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/iw
Lapack: YES (D:/opencv/OpenBLAS/lib/libopenblas.lib)
OpenVINO: YES (2022.1.0)
Custom HAL: NO
Protobuf: build (3.19.1)
Flatbuffers: builtin/3rdparty (23.5.9)
NVIDIA CUDA: YES (ver 11.7, CUFFT CUBLAS FAST_MATH)
NVIDIA GPU arch: 61 70 75 86
NVIDIA PTX archs:
cuDNN: YES (ver 8.6.0)
OpenCL: YES (NVD3D11)
Include path: D:/opencv/opencv-4.8.0/3rdparty/include/opencl/1.2
Link libraries: Dynamic load
ONNX: YES
Include path: D:/opencv/onnxruntime-1.15.1/include/onnxruntime/core/session
Link libraries: D:/env/Download/onnxruntime/onnxruntime-win-x64-gpu-1.15.1/lib/onnxruntime.lib
Python (for build): D:/ProgramData/Anaconda3/python.exe
Install to: D:/opencv/opencv-4.8.0/build/install
-----------------------------------------------------------------
forward time = 24.4056
D:\vcworkspaces\yolov4_tiny_dnn_demo\x64\Release\dnnspeedtest.exe (进程 16252)已退出,代码为 0。
按任意键关闭此窗口. . .
I not only compiled an opencv480 openvino. CMake has set different compilation configurations, and I have compiled 5 versions. The inference time difference between the versions is not significant, but the results show that the inference time cannot reach the speed of opencv455 openvino, always more than ten milliseconds apart.
disable the openvino so that we can make sure dnn was running by cpu instead of openvino. Since the original 4.5 or 4.8 can not achieve yolov4 tiny model about 10 ms only by cpu. That's value is wired. openvino has special optimize for intel cpu,and 10ms for openvino is reasonable value.
To further investigate the speed issue. The following table result would be helpful.
Speed of yolov4-tiny | OpenCV 4.5 | opencv 4.8 |
---|---|---|
net.setPreferableBackend(DNN_BACKEND_INFERENCE_ENGINE); | ||
net.setPreferableBackend(0); |
Looking forward to your reply.
@zihaomu I just completed the compilation of opencv455 VC16. Here are my latest test results. Currently, there are four versions available on my end: opencv455 and opencv480, as well as four versions with openvino each.
Speed of yolov4-tiny | OpenCV 4.5 | opencv 4.8 | OpenCV 4.5-openvino | opencv 4.8-openvino net.setPreferableBackend(cv::dnn::DNN_BACKEND_INFERENCE_ENGINE); | - | - | 10.4434 | 9.501 net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV); | 15.8422 | 25.1356 | 16.2936 | 24.7036
Thanks for your work. I will take a look at the details. And do the layer-by-layer speed comparison.
BTW, what's your cpu details info?
BTW, what's your cpu details info?
I mentioned it earlier. It's I7-9700.
I still cannot reproduce the result on my Intel i7-9750 CPU laptop, windows 10, VS2019. The speed is the following:
I have a bit of doubt about how you are able to get 20ms on your i7-9700 with just the CPU.
I still cannot reproduce the result on my Intel i7-9750 CPU laptop, windows 10, VS2019. The speed is the following:
- opencv-4.8 takes about 50 ms.
- opencv-4.5.5 takes about 45 ms.
I have a bit of doubt about how you are able to get 20ms on your i7-9700 with just the CPU.
@zihaomu There are a few differences, mine is the I7-9700 on the desktop. Not a laptop, using the WIN11 system and maintaining the latest updates.
And I have tested it with my AMD 5600X desktop, it should be faster than i7-9700. And it takes about 40 ms. Maybe I miss something. Can you test the single-core performance on your site?
Another test is the following:
while(1)
{
net.forward(detections, output_names);
}
To test if GPU usage goes up. I'm concerned it actually using the GPU, by opencl or cuda.
@WanliZhong Please take a look.
@ZJDATY Consider using OpenCV performance tests. There is test case for yolov4-tiny
here: https://github.com/opencv/opencv/blob/4.8.0/modules/dnn/perf/perf_net.cpp#L242
Unfortunately used model in 4.8.0 and 4.5.5 are not the same (details #23008). Need to restore test (perf_net.cpp) from 4.5.5 for correct comparison (however timings are similar on my machine, perhaps only weights are changed).
There are could be many reasons for performance changes.
Consider using --perf_threads=1
to disable multi-threading during the test.
I don't see degradation on Linux (GCC 12) with i7-12700K:
Name of Test | 455-1th | 480-1th-sametestdata | 480-1th-sametestdata vs 455-1th (x-factor) |
---|---|---|---|
YOLOv4_tiny::DNNTestNetwork::OCV/CPU | 65.478 | 40.288 | 1.63 |
Name of Test | 455-Nth | 480-Nth-sametestdata | 480-Nth-sametestdata vs 455-Nth (x-factor) |
---|---|---|---|
YOLOv4_tiny::DNNTestNetwork::OCV/CPU | 12.280 | 9.111 | 1.35 |
Used commands:
$ ./bin/opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --gtest_output=xml:../perf/dnn_23911/455-Nth.xml
...
$ ./bin/opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --perf_threads=1 --gtest_output=xml:../perf/dnn_23911/480-1th-sametestdata.xml
$ python $OPENCV_SRC_DIR/modules/ts/misc/summary.py -m median ../perf/dnn_23911/{455-1th,480-1th-sametestdata}.xml -o markdown
$ python $OPENCV_SRC_DIR/modules/ts/misc/summary.py -m median ../perf/dnn_23911/{455-Nth,480-Nth-sametestdata}.xml -o markdown
BTW, need to specify OPENCV_TEST_DATA_PATH="<opencv_extra>/testdata"
environment variable and run download script (pass YOLOv4-tiny
parameter to download one model only)
I still cannot reproduce the result on my Intel i7-9750 CPU laptop, windows 10, VS2019. The speed is the following:
- opencv-4.8 takes about 50 ms.
- opencv-4.5.5 takes about 45 ms.
I have a bit of doubt about how you are able to get 20ms on your i7-9700 with just the CPU.
@zihaomu @asmorkalov Did you use this program to test the time?
#include <opencv2/opencv.hpp>
int main()
{
cv::TickMeter *t = new cv::TickMeter();
t->reset();
auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
cv::Mat frame(416, 416, CV_32FC3), blob;
cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);
net.setInput(blob);
double minT = 100000;
for (int i = 0; i < 100; i++)
{
t->start();
net.forward();
t->stop();
double timeN = t->getAvgTimeMilli();
if (timeN < minT)
minT = timeN;
t->reset();
}
std::cout << cv::getBuildInformation() << std::endl;
std::cout << "forward time = " << minT << std::endl;
delete t;
return 0;
}
Here, I recorded a video.
link: https://pan.baidu.com/s/1pA6P2v28IxZtz6PIC446gw?pwd=jaq8
And I have tested it with my AMD 5600X desktop, it should be faster than i7-9700. And it takes about 40 ms. Maybe I miss something. Can you test the single-core performance on your site?
Another test is the following:
while(1) { net.forward(detections, output_names); }
To test if GPU usage goes up. I'm concerned it actually using the GPU, by opencl or cuda.
I tested and found that the GPU did not have any new resource consumption.
@ZJDATY Consider using OpenCV performance tests. There is test case for
yolov4-tiny
here: https://github.com/opencv/opencv/blob/4.8.0/modules/dnn/perf/perf_net.cpp#L242Unfortunately used model in 4.8.0 and 4.5.5 are not the same (details #23008). Need to restore test (perf_net.cpp) from 4.5.5 for correct comparison (however timings are similar on my machine, perhaps only weights are changed).
There are could be many reasons for performance changes. Consider using
--perf_threads=1
to disable multi-threading during the test.I don't see degradation on Linux (GCC 12) with i7-12700K:
- 1 thread:
Name of Test 455-1th 480-1th-sametestdata 480-1th-sametestdata vs 455-1th (x-factor) YOLOv4_tiny::DNNTestNetwork::OCV/CPU 65.478 40.288 1.63
- N threads (default=20):
Name of Test 455-Nth 480-Nth-sametestdata 480-Nth-sametestdata vs 455-Nth (x-factor) YOLOv4_tiny::DNNTestNetwork::OCV/CPU 12.280 9.111 1.35 Used commands:
$ ./bin/opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --gtest_output=xml:../perf/dnn_23911/455-Nth.xml ... $ ./bin/opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --perf_threads=1 --gtest_output=xml:../perf/dnn_23911/480-1th-sametestdata.xml $ python $OPENCV_SRC_DIR/modules/ts/misc/summary.py -m median ../perf/dnn_23911/{455-1th,480-1th-sametestdata}.xml -o markdown $ python $OPENCV_SRC_DIR/modules/ts/misc/summary.py -m median ../perf/dnn_23911/{455-Nth,480-Nth-sametestdata}.xml -o markdown
BTW, need to specify
OPENCV_TEST_DATA_PATH="<opencv_extra>/testdata"
environment variable and run download script (passYOLOv4-tiny
parameter to download one model only)
@zihaomu Please forgive my ignorance, I don't know how to set parameters, this statement seems to have no result.I placed the yolo folder under the same level directory. Like this:
opencv_perf_dnn --gtest_filter=./yolo/yolov4-tiny.weights --gtest_output=xml:result.xml
Time compensation is 0
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.5.5
OpenCV VCS version: unknown
Build type: Debug Release
WARNING: build value differs from runtime: Release
Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe (ver 19.29.30148.0)
Parallel framework: tbb (nthreads=8)
CPU features: SSE SSE2 SSE3 *SSE4.1 *SSE4.2 *FP16 *AVX *AVX2 *AVX512-SKX?
Intel(R) IPP version: ippIP AVX2 (l9) 2020.0.0 Gold (-) Oct 21 2019
Intel(R) IPP features code: 0x8000
OpenCL Platforms:
NVIDIA CUDA
dGPU: NVIDIA TITAN V (OpenCL 3.0 CUDA)
Intel(R) OpenCL HD Graphics
iGPU: Intel(R) UHD Graphics 630 (OpenCL 3.0 NEO )
Current OpenCL device:
Type = dGPU
Name = NVIDIA TITAN V
Version = OpenCL 3.0 CUDA
Driver version = 536.40
Address bits = 64
Compute units = 80
Max work group size = 1024
Local memory size = 48 KB
Max memory allocation size = 2 GB 1023 MB 880 KB
Double support = Yes
Half support = No
Host unified memory = No
Device extensions:
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_fp64
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_icd
cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_nv_pragma_unroll
cl_nv_d3d10_sharing
cl_khr_d3d10_sharing
cl_nv_d3d11_sharing
cl_nv_copy_opts
cl_nv_create_buffer
cl_khr_int64_base_atomics
cl_khr_int64_extended_atomics
cl_khr_device_uuid
cl_khr_pci_bus_info
cl_khr_external_semaphore
cl_khr_external_memory
cl_khr_external_semaphore_win32
cl_khr_external_memory_win32
Has AMD Blas = No
Has AMD Fft = No
Preferred vector width char = 1
Preferred vector width short = 1
Preferred vector width int = 1
Preferred vector width long = 1
Preferred vector width float = 1
Preferred vector width double = 1
Preferred vector width half = 0
Note: Google Test filter = ./yolo/yolov4-tiny.weights
[==========] Running 0 tests from 0 test cases.
[==========] 0 tests from 0 test cases ran. (0 ms total)
[ PASSED ] 0 tests.
opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --gtest_output=xml:result.xml
Time compensation is 0
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.5.5
OpenCV VCS version: unknown
Build type: Debug Release
WARNING: build value differs from runtime: Release
Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe (ver 19.29.30148.0)
Parallel framework: tbb (nthreads=8)
CPU features: SSE SSE2 SSE3 *SSE4.1 *SSE4.2 *FP16 *AVX *AVX2 *AVX512-SKX?
Intel(R) IPP version: ippIP AVX2 (l9) 2020.0.0 Gold (-) Oct 21 2019
Intel(R) IPP features code: 0x8000
OpenCL Platforms:
NVIDIA CUDA
dGPU: NVIDIA TITAN V (OpenCL 3.0 CUDA)
Intel(R) OpenCL HD Graphics
iGPU: Intel(R) UHD Graphics 630 (OpenCL 3.0 NEO )
Current OpenCL device:
Type = dGPU
Name = NVIDIA TITAN V
Version = OpenCL 3.0 CUDA
Driver version = 536.40
Address bits = 64
Compute units = 80
Max work group size = 1024
Local memory size = 48 KB
Max memory allocation size = 2 GB 1023 MB 880 KB
Double support = Yes
Half support = No
Host unified memory = No
Device extensions:
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_fp64
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_icd
cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_nv_pragma_unroll
cl_nv_d3d10_sharing
cl_khr_d3d10_sharing
cl_nv_d3d11_sharing
cl_nv_copy_opts
cl_nv_create_buffer
cl_khr_int64_base_atomics
cl_khr_int64_extended_atomics
cl_khr_device_uuid
cl_khr_pci_bus_info
cl_khr_external_semaphore
cl_khr_external_memory
cl_khr_external_semaphore_win32
cl_khr_external_memory_win32
Has AMD Blas = No
Has AMD Fft = No
Preferred vector width char = 1
Preferred vector width short = 1
Preferred vector width int = 1
Preferred vector width long = 1
Preferred vector width float = 1
Preferred vector width double = 1
Preferred vector width half = 0
Note: Google Test filter = *YOLOv4_tiny*
[==========] Running 12 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 9 tests from Layer_Slice
[ RUN ] Layer_Slice.YOLOv4_tiny_1/0, where GetParam() = OCV/OCL
[ PERFSTAT ] (samples=100 mean=0.30 median=0.29 min=0.28 stddev=0.02 (7.1%))
[ OK ] Layer_Slice.YOLOv4_tiny_1/0 (44 ms)
[ RUN ] Layer_Slice.YOLOv4_tiny_1/1, where GetParam() = OCV/OCL_FP16
[ PERFSTAT ] (samples=13 mean=0.29 median=0.29 min=0.29 stddev=0.01 (2.4%))
[ OK ] Layer_Slice.YOLOv4_tiny_1/1 (10 ms)
[ RUN ] Layer_Slice.YOLOv4_tiny_1/2, where GetParam() = OCV/CPU
[ PERFSTAT ] (samples=13 mean=0.04 median=0.04 min=0.04 stddev=0.00 (1.4%))
[ OK ] Layer_Slice.YOLOv4_tiny_1/2 (4 ms)
[ RUN ] Layer_Slice.YOLOv4_tiny_2/0, where GetParam() = OCV/OCL
[ PERFSTAT ] (samples=10 mean=0.19 median=0.18 min=0.18 stddev=0.00 (2.7%))
[ OK ] Layer_Slice.YOLOv4_tiny_2/0 (5 ms)
[ RUN ] Layer_Slice.YOLOv4_tiny_2/1, where GetParam() = OCV/OCL_FP16
[ PERFSTAT ] (samples=79 mean=0.18 median=0.18 min=0.17 stddev=0.01 (3.0%))
[ OK ] Layer_Slice.YOLOv4_tiny_2/1 (18 ms)
[ RUN ] Layer_Slice.YOLOv4_tiny_2/2, where GetParam() = OCV/CPU
[ PERFSTAT ] (samples=10 mean=0.02 median=0.02 min=0.02 stddev=0.00 (2.6%))
[ OK ] Layer_Slice.YOLOv4_tiny_2/2 (3 ms)
[ RUN ] Layer_Slice.YOLOv4_tiny_3/0, where GetParam() = OCV/OCL
[ PERFSTAT ] (samples=13 mean=0.32 median=0.32 min=0.31 stddev=0.01 (2.1%))
[ OK ] Layer_Slice.YOLOv4_tiny_3/0 (6 ms)
[ RUN ] Layer_Slice.YOLOv4_tiny_3/1, where GetParam() = OCV/OCL_FP16
[ PERFSTAT ] (samples=10 mean=0.31 median=0.31 min=0.31 stddev=0.00 (0.6%))
[ OK ] Layer_Slice.YOLOv4_tiny_3/1 (5 ms)
[ RUN ] Layer_Slice.YOLOv4_tiny_3/2, where GetParam() = OCV/CPU
[ PERFSTAT ] (samples=13 mean=0.01 median=0.01 min=0.01 stddev=0.00 (1.5%))
[ OK ] Layer_Slice.YOLOv4_tiny_3/2 (1 ms)
[----------] 9 tests from Layer_Slice (100 ms total)
[----------] 3 tests from DNNTestNetwork
[ RUN ] DNNTestNetwork.YOLOv4_tiny/0, where GetParam() = OCV/OCL
D:\opencv\opencv-4.5.5\modules\ts\src\ts_perf.cpp(2028): error: Failed
Expected: PerfTestBody() doesn't throw an exception.
Actual: it throws cv::Exception:
OpenCV(4.5.5) D:\opencv\opencv-4.5.5\modules\ts\src\ts.cpp:1064: error: (-2:Unspecified error) OpenCV tests: Can't find required data file: dnn/dog416.png in function 'cvtest::findData'
params = OCV/OCL
termination reason: unhandled exception
bytesIn = 0
bytesOut = 0
samples = 0 of 100
outliers = 0
frequency = 0
[ FAILED ] DNNTestNetwork.YOLOv4_tiny/0, where GetParam() = OCV/OCL (1 ms)
[ RUN ] DNNTestNetwork.YOLOv4_tiny/1, where GetParam() = OCV/OCL_FP16
D:\opencv\opencv-4.5.5\modules\ts\src\ts_perf.cpp(2028): error: Failed
Expected: PerfTestBody() doesn't throw an exception.
Actual: it throws cv::Exception:
OpenCV(4.5.5) D:\opencv\opencv-4.5.5\modules\ts\src\ts.cpp:1064: error: (-2:Unspecified error) OpenCV tests: Can't find required data file: dnn/dog416.png in function 'cvtest::findData'
params = OCV/OCL_FP16
termination reason: unhandled exception
bytesIn = 0
bytesOut = 0
samples = 0 of 100
outliers = 0
frequency = 0
[ FAILED ] DNNTestNetwork.YOLOv4_tiny/1, where GetParam() = OCV/OCL_FP16 (0 ms)
[ RUN ] DNNTestNetwork.YOLOv4_tiny/2, where GetParam() = OCV/CPU
D:\opencv\opencv-4.5.5\modules\ts\src\ts_perf.cpp(2028): error: Failed
Expected: PerfTestBody() doesn't throw an exception.
Actual: it throws cv::Exception:
OpenCV(4.5.5) D:\opencv\opencv-4.5.5\modules\ts\src\ts.cpp:1064: error: (-2:Unspecified error) OpenCV tests: Can't find required data file: dnn/dog416.png in function 'cvtest::findData'
params = OCV/CPU
termination reason: unhandled exception
bytesIn = 0
bytesOut = 0
samples = 0 of 100
outliers = 0
frequency = 0
[ FAILED ] DNNTestNetwork.YOLOv4_tiny/2, where GetParam() = OCV/CPU (1 ms)
[----------] 3 tests from DNNTestNetwork (2 ms total)
[----------] Global test environment tear-down
[==========] 12 tests from 2 test cases ran. (102 ms total)
[ PASSED ] 9 tests.
[ FAILED ] 3 tests, listed below:
[ FAILED ] DNNTestNetwork.YOLOv4_tiny/0, where GetParam() = OCV/OCL
[ FAILED ] DNNTestNetwork.YOLOv4_tiny/1, where GetParam() = OCV/OCL_FP16
[ FAILED ] DNNTestNetwork.YOLOv4_tiny/2, where GetParam() = OCV/CPU
3 FAILED TESTS
Hi @ZJDATY. Please set the environment variable: OPENCV_TEST_DATA_PATH
And the program will find and load the model automatically.
Reference: https://github.com/opencv/opencv/wiki/How_to_contribute
@ZJDATY Hi, as I test on windows, the result shows 4.8.0 is faster (use median value)
1 thread: | Name of Test | 455-1th | 480-1th | 480-1th vs 455-1th (x-factor) |
---|---|---|---|---|
YOLOv4_tiny::DNNTestNetwork::OCV/CPU | 940.120 | 424.721 | 2.21 |
N threads: | Name of Test | 455-Nth | 480-Nth | 480-Nth vs 455-Nth (x-factor) |
---|---|---|---|---|
YOLOv4_tiny::DNNTestNetwork::OCV/CPU | 358.424 | 164.512 | 2.18 |
Can you print the opencv version by cv::getVersionString()
to make sure your version doesn't reverse in your performance test?
Hi,@zihaomu @WanliZhong @asmorkalov @fengyuentau I have completed the test.
Opencv4.8 single-core
D:\opencv\opencv-4.8.0\build\bin\Release>opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --perf_threads=1 --gtest_output=xml:result.xml
Result
[ PERFSTAT ] (samples=10 mean=69.02 median=68.82 min=68.08 stddev=0.70 (1.0%))
Opencv4.8 multi-core
D:\opencv\opencv-4.8.0\build\bin\Release>opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --gtest_output=xml:result.xml
Result
[ PERFSTAT ] (samples=15 mean=25.29 median=25.15 min=24.25 stddev=0.73 (2.9%))
Opencv4.5.5 single-core
D:\opencv\opencv-4.5.5\build\bin\Release>opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --perf_threads=1 --gtest_output=xml:result.xml
Result
[ PERFSTAT ] (samples=10 mean=81.12 median=80.85 min=80.38 stddev=0.73 (0.9%))
Opencv4.5 multi-core
D:\opencv\opencv-4.5.5\build\bin\Release>opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --gtest_output=xml:result.xml
Result
[ PERFSTAT ] (samples=100 mean=17.11 median=16.79 min=16.23 stddev=0.81 (4.7%))
The results show that in single-core reasoning, 4.8 is better than 4.5, but in multi core reasoning, the above results of 25ms and 16ms are consistent with my test results.I don't understand why my multi-core test results are different from yours. I can provide a remote desktop for you to troubleshoot on my computer.
I also tested the inference results using C++programs, including compilation information.
#include <opencv2/opencv.hpp>
int main()
{
cv::TickMeter *t = new cv::TickMeter();
t->reset();
auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
cv::Mat frame(416, 416, CV_32FC3), blob;
cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);
net.setInput(blob);
double minT = 100000;
for (int i = 0; i < 100; i++)
{
t->start();
net.forward();
t->stop();
double timeN = t->getAvgTimeMilli();
if (timeN < minT)
minT = timeN;
t->reset();
}
std::cout << cv::getVersionString() << std::endl;
std::cout << cv::getBuildInformation() << std::endl;
std::cout << "forward time = " << minT << " ms" << std::endl;
delete t;
return 0;
}
455 Result
D:\opencv\opencv-4.5.5\build\bin\Release>D:\opencv\opencv-4.5.5\build\bin\Release\dnnspeedtest.exe
4.5.5
General configuration for OpenCV 4.5.5 =====================================
Version control: unknown
Platform:
Timestamp: 2023-07-05T05:04:59Z
Host: Windows 10.0.22621 AMD64
CMake: 3.26.4
CMake generator: Visual Studio 16 2019
CMake build tool: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe
MSVC: 1929
Configuration: Debug Release
CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (13 files): + SSSE3 SSE4_1
SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (0 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (4 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (26 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
AVX512_SKX (4 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe (ver 19.29.30148.0)
C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MD /O2 /Ob2 /DNDEBUG
C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MDd /Zi /Ob0 /Od /RTC1
C Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MD /O2 /Ob2 /DNDEBUG
C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MDd /Zi /Ob0 /Od /RTC1
Linker flags (Release): /machine:x64 /INCREMENTAL:NO
Linker flags (Debug): /machine:x64 /debug /INCREMENTAL
ccache: NO
Precompiled headers: YES
Extra dependencies:
3rdparty dependencies:
OpenCV modules:
To be built: core dnn flann highgui imgcodecs imgproc ml photo ts video videoio
Disabled: features2d java_bindings_generator js_bindings_generator python_bindings_generator python_tests world
Disabled by dependency: calib3d objdetect stitching
Unavailable: gapi java python2 python3
Applications: perf_tests
Documentation: NO
Non-free algorithms: NO
Windows RT support: NO
GUI: WIN32UI
Win32 UI: YES
Media I/O:
ZLib: build (ver 1.2.11)
JPEG: build-libjpeg-turbo (ver 2.1.2-62)
WEBP: build (ver encoder: 0x020f)
PNG: build (ver 1.6.37)
TIFF: build (ver 42 - 4.2.0)
JPEG 2000: build Jasper (ver 1.900.1)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES
Video I/O:
FFMPEG: YES (prebuilt binaries)
avcodec: YES (58.134.100)
avformat: YES (58.76.100)
avutil: YES (56.70.100)
swscale: YES (5.9.100)
avresample: YES (4.0.0)
DirectShow: YES
Media Foundation: YES
DXVA: YES
Parallel framework: Concurrency
Trace: YES (with Intel ITT)
Other third-party libraries:
Intel IPP: 2020.0.0 Gold [2020.0.0]
at: D:/opencv/opencv-4.5.5/build/3rdparty/ippicv/ippicv_win/icv
Intel IPP IW: sources (2020.0.0)
at: D:/opencv/opencv-4.5.5/build/3rdparty/ippicv/ippicv_win/iw
Eigen: NO
Custom HAL: NO
Protobuf: build (3.19.1)
OpenCL: YES (NVD3D11)
Include path: D:/opencv/opencv-4.5.5/3rdparty/include/opencl/1.2
Link libraries: Dynamic load
Python (for build): D:/ProgramData/Anaconda3/python.exe
Install to: D:/opencv/opencv-4.5.5/build/install
-----------------------------------------------------------------
forward time = 16.1467 ms
480 Result
D:\opencv\opencv-4.8.0\build\bin\Release>D:\opencv\opencv-4.8.0\build\bin\Release\dnnspeedtest.exe
4.8.0
General configuration for OpenCV 4.8.0 =====================================
Version control: unknown
Extra modules:
Location (extra): D:/opencv/opencv_contrib-4.8.0/modules
Version control (extra): unknown
Platform:
Timestamp: 2023-06-29T13:58:22Z
Host: Windows 10.0.22621 AMD64
CMake: 3.26.4
CMake generator: Visual Studio 16 2019
CMake build tool: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe
MSVC: 1929
Configuration: Debug Release
CPU/HW features:
Baseline: SSE SSE2 SSE3 SSSE3 SSE4_1 POPCNT SSE4_2
requested: SSE4_2
Dispatched code generation: FP16 AVX AVX2 AVX512_SKX
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
FP16 (0 files): + FP16 AVX
AVX (7 files): + AVX
AVX2 (33 files): + FP16 FMA3 AVX AVX2
AVX512_SKX (5 files): + FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe (ver 19.29.30148.0)
C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP /MD /O2 /Ob2 /DNDEBUG
C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP /MDd /Zi /Ob0 /Od /RTC1
C Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MD /O2 /Ob2 /DNDEBUG
C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MDd /Zi /Ob0 /Od /RTC1
Linker flags (Release): /machine:x64 /INCREMENTAL:NO
Linker flags (Debug): /machine:x64 /debug /INCREMENTAL
ccache: NO
Precompiled headers: YES
Extra dependencies:
3rdparty dependencies:
OpenCV modules:
To be built: calib3d core dnn features2d flann highgui imgcodecs imgproc ml objdetect photo stitching ts video videoio
Disabled: aruco bgsegm bioinspired ccalib datasets dnn_objdetect dnn_superres dpm face fuzzy hfs img_hash intensity_transform java_bindings_generator js_bindings_generator line_descriptor mcc objc_bindings_generator optflow phase_unwrapping plot python_bindings_generator python_tests quality rapid reg rgbd saliency shape stereo structured_light superres surface_matching text tracking videostab wechat_qrcode world xfeatures2d ximgproc xobjdetect xphoto
Disabled by dependency: -
Unavailable: alphamat cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv freetype gapi hdf java julia matlab ovis python2 python3 sfm viz
Applications: perf_tests
Documentation: NO
Non-free algorithms: NO
Windows RT support: NO
GUI: WIN32UI
Win32 UI: YES
Media I/O:
ZLib: build (ver 1.2.13)
JPEG: build-libjpeg-turbo (ver 2.1.3-62)
SIMD Support Request: YES
SIMD Support: NO
PNG: build (ver 1.6.37)
JPEG 2000: build Jasper (ver 1.900.1)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES
Video I/O:
FFMPEG: YES (prebuilt binaries)
avcodec: YES (58.134.100)
avformat: YES (58.76.100)
avutil: YES (56.70.100)
swscale: YES (5.9.100)
avresample: YES (4.0.0)
DirectShow: YES
Media Foundation: YES
DXVA: YES
Parallel framework: TBB (ver 2020.2 interface 11102)
Trace: YES (with Intel ITT)
Other third-party libraries:
Intel IPP: 2021.8 [2021.8.0]
at: D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/icv
Intel IPP IW: sources (2021.8.0)
at: D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/iw
Custom HAL: NO
Protobuf: build (3.19.1)
OpenCL: YES (no extra features)
Include path: D:/opencv/opencv-4.8.0/3rdparty/include/opencl/1.2
Link libraries: Dynamic load
Python (for build): NO
Install to: D:/opencv/opencv-4.8.0/build/install
-----------------------------------------------------------------
forward time = 24.8977 ms
Looking forward to your reply.
opencv_perf_dnn --gtest_filter=YOLOv4_tiny --perf_threads=N --gtest_output=xml:result.xml | OpenCV version | Perf_threads | Time |
---|---|---|---|
455 | 1 | 80.8 | |
455 | 2 | 42.1 | |
455 | 4 | 25.3 | |
455 | 8 | 16.7 | |
480 | 1 | 69.1 | |
480 | 2 | 40.8 | |
480 | 4 | 28.9 | |
480 | 8 | 25.4 |
@ZJDATY Can you try this patch:https://github.com/opencv/opencv/pull/23952? Maybe it fixs this issue.
There is difference in build configuration related to used parallel framework:
Parallel framework: Concurrency
vs
Parallel framework: TBB (ver 2020.2 interface 11102)
We need to fix build configuration first (to "compare apples to apples").
There is difference in build configuration related to used parallel framework:
Parallel framework: Concurrency
vs
Parallel framework: TBB (ver 2020.2 interface 11102)
We need to fix build configuration first (to "compare apples to apples").
@zihaomu @WanliZhong I discovered this yesterday, so I have recompiled and tested it, and the results are still the same.
Microsoft Windows [版本 10.0.22621.1992]
(c) Microsoft Corporation。保留所有权利。
C:\Users\ZHANG\Desktop\test>C:\Users\ZHANG\Desktop\test\dnnspeedtest.exe
4.5.5
General configuration for OpenCV 4.5.5 =====================================
Version control: unknown
Platform:
Timestamp: 2023-07-05T05:04:59Z
Host: Windows 10.0.22621 AMD64
CMake: 3.26.4
CMake generator: Visual Studio 16 2019
CMake build tool: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe
MSVC: 1929
Configuration: Debug Release
CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (13 files): + SSSE3 SSE4_1
SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (0 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (4 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (26 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
AVX512_SKX (4 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe (ver 19.29.30148.0)
C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MD /O2 /Ob2 /DNDEBUG
C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MDd /Zi /Ob0 /Od /RTC1
C Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MD /O2 /Ob2 /DNDEBUG
C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MDd /Zi /Ob0 /Od /RTC1
Linker flags (Release): /machine:x64 /INCREMENTAL:NO
Linker flags (Debug): /machine:x64 /debug /INCREMENTAL
ccache: NO
Precompiled headers: YES
Extra dependencies:
3rdparty dependencies:
OpenCV modules:
To be built: core dnn flann highgui imgcodecs imgproc ml photo ts video videoio
Disabled: features2d java_bindings_generator js_bindings_generator python_bindings_generator python_tests world
Disabled by dependency: calib3d objdetect stitching
Unavailable: gapi java python2 python3
Applications: perf_tests
Documentation: NO
Non-free algorithms: NO
Windows RT support: NO
GUI: WIN32UI
Win32 UI: YES
Media I/O:
ZLib: build (ver 1.2.11)
JPEG: build-libjpeg-turbo (ver 2.1.2-62)
WEBP: build (ver encoder: 0x020f)
PNG: build (ver 1.6.37)
TIFF: build (ver 42 - 4.2.0)
JPEG 2000: build Jasper (ver 1.900.1)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES
Video I/O:
FFMPEG: YES (prebuilt binaries)
avcodec: YES (58.134.100)
avformat: YES (58.76.100)
avutil: YES (56.70.100)
swscale: YES (5.9.100)
avresample: YES (4.0.0)
DirectShow: YES
Media Foundation: YES
DXVA: YES
Parallel framework: Concurrency
Trace: YES (with Intel ITT)
Other third-party libraries:
Intel IPP: 2020.0.0 Gold [2020.0.0]
at: D:/opencv/opencv-4.5.5/build/3rdparty/ippicv/ippicv_win/icv
Intel IPP IW: sources (2020.0.0)
at: D:/opencv/opencv-4.5.5/build/3rdparty/ippicv/ippicv_win/iw
Eigen: NO
Custom HAL: NO
Protobuf: build (3.19.1)
OpenCL: YES (NVD3D11)
Include path: D:/opencv/opencv-4.5.5/3rdparty/include/opencl/1.2
Link libraries: Dynamic load
Python (for build): D:/ProgramData/Anaconda3/python.exe
Install to: D:/opencv/opencv-4.5.5/build/install
-----------------------------------------------------------------
forward time = 15.9708 ms
C:\Users\ZHANG\Desktop\test>C:\Users\ZHANG\Desktop\test\dnnspeedtest480.exe
4.8.0
General configuration for OpenCV 4.8.0 =====================================
Version control: unknown
Extra modules:
Location (extra): D:/opencv/opencv_contrib-4.8.0/modules
Version control (extra): unknown
Platform:
Timestamp: 2023-06-29T13:58:22Z
Host: Windows 10.0.22621 AMD64
CMake: 3.26.4
CMake generator: Visual Studio 16 2019
CMake build tool: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe
MSVC: 1929
Configuration: Debug Release
CPU/HW features:
Baseline: SSE SSE2 SSE3 SSSE3
requested: SSSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (13 files): + SSE4_1
SSE4_2 (1 files): + SSE4_1 POPCNT SSE4_2
FP16 (0 files): + SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (7 files): + SSE4_1 POPCNT SSE4_2 AVX
AVX2 (30 files): + SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
AVX512_SKX (4 files): + SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe (ver 19.29.30148.0)
C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP /MD /O2 /Ob2 /DNDEBUG
C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP /MDd /Zi /Ob0 /Od /RTC1
C Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MD /O2 /Ob2 /DNDEBUG
C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP /MDd /Zi /Ob0 /Od /RTC1
Linker flags (Release): /machine:x64 /INCREMENTAL:NO
Linker flags (Debug): /machine:x64 /debug /INCREMENTAL
ccache: NO
Precompiled headers: YES
Extra dependencies:
3rdparty dependencies:
OpenCV modules:
To be built: core dnn flann highgui imgcodecs imgproc ml photo ts video videoio
Disabled: aruco bgsegm bioinspired ccalib datasets dnn_objdetect dnn_superres dpm face features2d fuzzy hfs img_hash intensity_transform java_bindings_generator js_bindings_generator line_descriptor mcc optflow phase_unwrapping plot python_bindings_generator python_tests quality rapid reg rgbd saliency shape stereo structured_light superres surface_matching text tracking videostab wechat_qrcode world xfeatures2d ximgproc xobjdetect xphoto
Disabled by dependency: calib3d objdetect stitching
Unavailable: alphamat cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv freetype gapi hdf java julia matlab ovis python2 python3 sfm viz
Applications: perf_tests
Documentation: NO
Non-free algorithms: NO
Windows RT support: NO
GUI: WIN32UI
Win32 UI: YES
Media I/O:
ZLib: build (ver 1.2.13)
JPEG: build-libjpeg-turbo (ver 2.1.3-62)
SIMD Support Request: YES
SIMD Support: NO
WEBP: build (ver encoder: 0x020f)
PNG: build (ver 1.6.37)
TIFF: build (ver 42 - 4.2.0)
JPEG 2000: build Jasper (ver 1.900.1)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES
Video I/O:
FFMPEG: YES (prebuilt binaries)
avcodec: YES (58.134.100)
avformat: YES (58.76.100)
avutil: YES (56.70.100)
swscale: YES (5.9.100)
avresample: YES (4.0.0)
DirectShow: YES
Media Foundation: YES
DXVA: YES
Parallel framework: Concurrency
Trace: YES (with Intel ITT)
Other third-party libraries:
Intel IPP: 2021.8 [2021.8.0]
at: D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/icv
Intel IPP IW: sources (2021.8.0)
at: D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/iw
Eigen: NO
Custom HAL: NO
Protobuf: build (3.19.1)
OpenCL: YES (NVD3D11)
Include path: D:/opencv/opencv-4.8.0/3rdparty/include/opencl/1.2
Link libraries: Dynamic load
Python (for build): NO
Install to: D:/opencv/opencv-4.8.0/build/install
-----------------------------------------------------------------
forward time = 25.4416 ms
C:\Users\ZHANG\Desktop\test>
@ZJDATY Can you try this patch:#23952? Maybe it fixs this issue.
@zihaomu Hi,I saw that the problem is that it slows down compared to 4.7, but I have also compiled 4.7.0, and the test results are not significantly different from 4.8.0, so I don't think it can solve my problem. The results I tested seem to indicate a correlation with multi core scheduling.
Hi,@zihaomu @asmorkalov @fengyuentau @WanliZhong Do you have any solution to this problem now? I'm happy to assist with testing, If you need me to provide my remote desktop, please let me know.
Can you try this patch:https://github.com/opencv/opencv/pull/23952? Maybe it fixs this issue.
I still can not reproduce this issue, both for single thread and multi-thread on my machine.
Can you try this patch:#23952? Maybe it fixs this issue.
I still can not reproduce this issue, both for single thread and multi-thread on my machine.
Hi ,@zihaomu @fengyuentau @WanliZhong I just compiled this patch. The test results are the same as before, and I tested the same effect on all three computers with different CPUs. Can you test the software I generated on your computer?
dnnspeedtest.cpp
#include <opencv2/opencv.hpp>
int main()
{
cv::TickMeter *t = new cv::TickMeter();
t->reset();
auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
cv::Mat frame(416, 416, CV_32FC3), blob;
//cv::setNumThreads(6);
cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);
net.setInput(blob);
double minT = 100000;
for (int i = 0; i < 100; i++)
{
t->start();
net.forward();
t->stop();
double timeN = t->getAvgTimeMilli();
if (timeN < minT)
minT = timeN;
t->reset();
}
std::cout << cv::getVersionString() << std::endl;
std::cout << cv::getBuildInformation() << std::endl;
std::cout << "forward time = " << minT << " ms" << std::endl;
delete t;
return 0;
}
And the compiled opencv perf dnn.exe
Please test it, I really don't understand why you can't test my results.
https://www.aliyundrive.com/s/aZ8XoLEfZcg
The result of my testing is that the time for opencv455 is 15-16ms, and the time for opencv480 is 25-26ms.
Is it platform specific? Could you try the same tests on Ubuntu?
Is it platform specific? Could you try the same tests on Ubuntu?
Hi,@fengyuentau Although I have a computer with Ubuntu installed, I still cannot write programs using Ubuntu. Can you help test the comparison results of these two programs under Windows? My current test result is that under multi-core, opencv4.8.0 will be slower than opencv4.5.5. Can you share two versions of DLL files compiled based on VS2019? I would like to check if there is a problem with the cmake compiler.
I just stumbled across an issue that might be very similar or the reason. The same inference on 4.8.0 compared to 4.5.2 is sometimes slower. I traced it down to 4.5.2 using maximum 553.6 MB RAM wand 4.8.0 is using 1.89 GB RAM. Anybody seeing the problem might be swapping? Is the very high RAM usage a known issue?
I created a new issue with the requested information: https://github.com/opencv/opencv/issues/24134
I just ran more test and have increases from 1.368 s for version 4.5.2 to 4.351 s for version 4.8.0.
This version is just collecting show stopper bugs.
Is it related to the reason why multithreaded inference in Opencv 4.8.0 takes longer than Opencv 4.5.5? I have been waiting for this problem to be resolved. https://github.com/opencv/opencv/issues/24134
@ZJDATY , I haven't observed that effect. I am comparing multi-threaded inference between 4.5.2 and 4.8.0. After a lengthy discussion here: https://github.com/opencv/opencv/issues/24134#issuecomment-1674667154, I suggest to try net.enableWinograd(false)
before the inference. That recovered the 4.5.2 speed for me and some (not all) of the increased memory usage.
Can you try this patch:#23952? Maybe it fixs this issue. I still can not reproduce this issue, both for single thread and multi-thread on my machine.
Hi ,@zihaomu @fengyuentau @WanliZhong I just compiled this patch. The test results are the same as before, and I tested the same effect on all three computers with different CPUs. Can you test the software I generated on your computer?
dnnspeedtest.cpp
#include <opencv2/opencv.hpp> int main() { cv::TickMeter *t = new cv::TickMeter(); t->reset(); auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights"); net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV); net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU); cv::Mat frame(416, 416, CV_32FC3), blob; //cv::setNumThreads(6); cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F); net.setInput(blob); double minT = 100000; for (int i = 0; i < 100; i++) { t->start(); net.forward(); t->stop(); double timeN = t->getAvgTimeMilli(); if (timeN < minT) minT = timeN; t->reset(); } std::cout << cv::getVersionString() << std::endl; std::cout << cv::getBuildInformation() << std::endl; std::cout << "forward time = " << minT << " ms" << std::endl; delete t; return 0; }
And the compiled opencv perf dnn.exe
Please test it, I really don't understand why you can't test my results.
https://www.aliyundrive.com/s/aZ8XoLEfZcg
The result of my testing is that the time for opencv455 is 15-16ms, and the time for opencv480 is 25-26ms.
@ukoehler Can you help me test this program? The files I have compiled are all on the network drive, and you can also compile my test code yourself.
System Information
OpenCV version: 4.8.0 Operating System / Platform: win10 Compiler & compiler version: vs2019
Compare versions: OpenCV version:OpenCV 4.5.5-openvino and the same envirment
Detailed description
Display compilation information and statistical time for both versions. The time consumption has increased from 20ms to 30ms.
opencv4.8.0
Steps to reproduce
Related models and video downloads.
Download: https://pan.baidu.com/s/1wpXBbdtJMUrYULAelglIiw?pwd=x67k
Issue submission checklist