opencv / opencv

Open Source Computer Vision Library
https://opencv.org
Apache License 2.0
75.95k stars 55.62k forks source link

OpenCV4.8.0 DNN inference speed reduced by 50%. #23911

Open ZJDATY opened 10 months ago

ZJDATY commented 10 months ago

System Information

OpenCV version: 4.8.0 Operating System / Platform: win10 Compiler & compiler version: vs2019

Compare versions: OpenCV version:OpenCV 4.5.5-openvino and the same envirment

Detailed description

Display compilation information and statistical time for both versions. The time consumption has increased from 20ms to 30ms.

General configuration for OpenCV 4.5.5-openvino =====================================
  Version control:               c3d60a6cac5b5b3c452d766494d21b005221efe0

  Platform:
    Timestamp:                   2022-03-11T02:31:28Z
    Host:                        Windows 10.0.19044 AMD64
    CMake:                       3.14.5
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1929
    Configuration:               Debug Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3 SSSE3 SSE4_1 POPCNT SSE4_2
      requested:                 SSE4_2
    Dispatched code generation:  FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      FP16 (1 files):            + FP16 AVX
      AVX (5 files):             + AVX
      AVX2 (33 files):           + FP16 FMA3 AVX AVX2
      AVX512_SKX (8 files):      + FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30037/bin/Hostx64/x64/cl.exe  (ver 19.29.30040.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /GS /sdl /guard:cf /w34018 /w34146 /w34244 /w34267 /w34302 /w34308 /w34509 /w34532 /w34533 /w34700 /w34789 /w34995 /w34996 /MP  /MD /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /GS /sdl /guard:cf /w34018 /w34146 /w34244 /w34267 /w34302 /w34308 /w34509 /w34532 /w34533 /w34700 /w34789 /w34995 /w34996 /MP  /MDd /Zi /Ob0 /Od /RTC1
    C Compiler:                  C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30037/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /GS /sdl /guard:cf /w34018 /w34146 /w34244 /w34267 /w34302 /w34308 /w34509 /w34532 /w34533 /w34700 /w34789 /w34995 /w34996 /MP   /MD /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /GS /sdl /guard:cf /w34018 /w34146 /w34244 /w34267 /w34302 /w34308 /w34509 /w34532 /w34533 /w34700 /w34789 /w34995 /w34996 /MP /MDd /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64   /guard:cf /dynamicbase /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64   /guard:cf /dynamicbase /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo python3 stitching ts video videoio
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 java python2
    Applications:                tests perf_tests apps
    Documentation:               NO
    Non-free algorithms:         NO

  Windows RT support:            NO

  GUI:                           WIN32UI
    Win32 UI:                    YES

  Media I/O:
    ZLib:                        build (ver 1.2.11)
    JPEG:                        build-libjpeg-turbo (ver 2.1.2-62)
    PNG:                         build (ver 1.6.37)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES
    Intel Media SDK:             YES (VPL::dispatcher VPL::api)

  Parallel framework:            Concurrency

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2020.0.0 Gold [2020.0.0]
           at:                   C:/jenkins/workspace/windows9f26d2f1/build_release/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2020.0.0)
              at:                C:/jenkins/workspace/windows9f26d2f1/build_release/3rdparty/ippicv/ippicv_win/iw
    Inference Engine:            YES (2022010000 / 2022.1.0)
        * libs:                  C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/lib/intel64/Release/openvino.lib / C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/lib/intel64/Debug/openvinod.lib C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Release/openvino.dll / C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Debug/openvinod.dll C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Release/openvino.dll C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Debug/openvinod.dll
        * includes:              C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/include C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/include/ie
    nGraph:                      YES (2022.1.0)
        * libs:                  C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/lib/intel64/Release/openvino.lib / C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/lib/intel64/Debug/openvinod.lib C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Release/openvino.dll / C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Debug/openvinod.dll C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Release/openvino.dll C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Debug/openvinod.dll
        * includes:              C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/include C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/include/ie
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)

  OpenCL:                        YES (NVD3D11)
    Include path:                C:/jenkins/workspace/windows9f26d2f1/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 3:
    Interpreter:                 C:/miniconda/envs/py3_env/python.exe (ver 3.4.5)
    Libraries:                   C:/miniconda/envs/py3_env/libs/python34.lib (ver 3.4.5)
    numpy:                       C:/miniconda/envs/py3_env/lib/site-packages/numpy/core/include (ver 1.11.3)
    install path:                python/cv2/python-3

  Python (for build):            C:/miniconda/envs/py3_env/python.exe

  Install to:                    C:/jenkins/workspace/windows9f26d2f1/build_release/install
-----------------------------------------------------------------

[yolov4]
        init >> 39.592ms
        inference >> min = 21.601ms, max = 32.751ms, mean = 21.846ms, stddev = 1.35271ms

D:\vcworkspaces\yolov4_tiny_dnn_demo\x64\Release\yolov4_tiny_dnn_demo.exe (进程 7704)已退出,代码为 0。
按任意键关闭此窗口. . .

opencv4.8.0

General configuration for OpenCV 4.8.0 =====================================
  Version control:               4.8.0

  Platform:
    Timestamp:                   2023-06-28T12:35:18Z
    Host:                        Windows 10.0.19045 AMD64
    CMake:                       3.23.3
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1928
    Configuration:               Debug Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (16 files):         + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (7 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (35 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (5 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe  (ver 19.28.29334.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /MD /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /MDd /Zi /Ob0 /Od /RTC1
    C Compiler:                  C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP   /MD /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP /MDd /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo stitching video videoio world
    Disabled:                    python3
    Disabled by dependency:      -
    Unavailable:                 java python2 ts
    Applications:                apps
    Documentation:               NO
    Non-free algorithms:         NO

  Windows RT support:            NO

  GUI:
    Win32 UI:                    YES
    VTK support:                 NO

  Media I/O:
    ZLib:                        build (ver 1.2.13)
    JPEG:                        build-libjpeg-turbo (ver 2.1.3-62)
      SIMD Support Request:      YES
      SIMD Support:              NO
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         build (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build (ver 2.5.0)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    DC1394:                      NO
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    GStreamer:                   NO
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES

  Parallel framework:            Concurrency

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2021.8 [2021.8.0]
           at:                   C:/GHA-OCV-1/_work/ci-gha-workflow/ci-gha-workflow/build/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2021.8.0)
              at:                C:/GHA-OCV-1/_work/ci-gha-workflow/ci-gha-workflow/build/3rdparty/ippicv/ippicv_win/iw
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)
    Flatbuffers:                 builtin/3rdparty (23.5.9)

  OpenCL:                        YES (NVD3D11)
    Include path:                C:/GHA-OCV-1/_work/ci-gha-workflow/ci-gha-workflow/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python (for build):            C:/Python-3.9/python.exe

  Java:
    ant:                         C:/apache-ant-1.9.15/bin/ant.bat (ver 1.9.15)
    Java:                        NO
    JNI:                         C:/Program Files/Java/jdk-11.0.9/include C:/Program Files/Java/jdk-11.0.9/include/win32 C:/Program Files/Java/jdk-11.0.9/include
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    C:/GHA-OCV-1/_work/ci-gha-workflow/ci-gha-workflow/install
-----------------------------------------------------------------

[yolov4]
        init >> 92.686ms
        inference >> min = 29.27ms, max = 38.598ms, mean = 32.5029ms, stddev = 1.07574ms

D:\vcworkspaces\yolov4_tiny_dnn_demo\x64\Release\yolov4_tiny_dnn_demo.exe (进程 11968)已退出,代码为 0。
按任意键关闭此窗口. . .

Steps to reproduce

#include <iostream>
#include <queue>
#include <iterator>
#include <sstream>
#include <fstream>
#include <iomanip>
#include <chrono>

#include <opencv2/core.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/dnn/all_layers.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <numeric>

constexpr float CONFIDENCE_THRESHOLD = 0;
constexpr float NMS_THRESHOLD = 0.4;
constexpr int NUM_CLASSES = 80;

// colors for bounding boxes
const cv::Scalar colors[] = {
    {0, 255, 255},
    {255, 255, 0},
    {0, 255, 0},
    {255, 0, 0}
};
const auto NUM_COLORS = sizeof(colors) / sizeof(colors[0]);

int main()
{
    std::cout << cv::getBuildInformation() << std::endl;
    std::vector<std::string> class_names;
    {
        std::ifstream class_file("yolo/coco.names");
        if (!class_file)
        {
            std::cerr << "failed to open classes.txt\n";
            return 0;
        }

        std::string line;
        while (std::getline(class_file, line))
            class_names.push_back(line);
    }
    std::string b = "./yolo/yolo_test.mp4";
    cv::VideoCapture source(b);

    auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
    net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
    net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
    auto output_names = net.getUnconnectedOutLayersNames();

    cv::Mat frame(416, 416, CV_32FC3), blob; //Tiny
    std::vector<cv::Mat> detections;
    std::vector<float> runtimes;
    auto init_start = std::chrono::steady_clock::now();
    cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F); //Tiny
    net.setInput(blob);
    net.forward(detections, output_names);
    auto init_end = std::chrono::steady_clock::now();
    while (cv::waitKey(1) < 1)
    {
        source >> frame;
        if (frame.empty())
            break;

        auto total_start = std::chrono::steady_clock::now();
        cv::dnn::blobFromImage(frame, blob, 1.0 / 255, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);  //Tiny
        net.setInput(blob);

        auto dnn_start = std::chrono::steady_clock::now();
        net.forward(detections, output_names);
        auto dnn_end = std::chrono::steady_clock::now();

        std::vector<int> indices[NUM_CLASSES];
        std::vector<cv::Rect> boxes[NUM_CLASSES];
        std::vector<float> scores[NUM_CLASSES];

        for (auto& output : detections)
        {
            const auto num_boxes = output.rows;
            for (int i = 0; i < num_boxes; i++)
            {
                auto x = output.at<float>(i, 0) * frame.cols;
                auto y = output.at<float>(i, 1) * frame.rows;
                auto width = output.at<float>(i, 2) * frame.cols;
                auto height = output.at<float>(i, 3) * frame.rows;
                cv::Rect rect(x - width / 2, y - height / 2, width, height);

                for (int c = 0; c < NUM_CLASSES; c++)
                {
                    auto confidence = *output.ptr<float>(i, 5 + c);
                    if (confidence >= CONFIDENCE_THRESHOLD)
                    {
                        boxes[c].push_back(rect);
                        scores[c].push_back(confidence);
                    }
                }
            }
        }

        for (int c = 0; c < NUM_CLASSES; c++)
            cv::dnn::NMSBoxes(boxes[c], scores[c], 0.0, NMS_THRESHOLD, indices[c]);

        for (int c = 0; c < NUM_CLASSES; c++)
        {
            for (size_t i = 0; i < indices[c].size(); ++i)
            {
                const auto color = colors[c % NUM_COLORS];

                auto idx = indices[c][i];
                const auto& rect = boxes[c][idx];
                cv::rectangle(frame, cv::Point(rect.x, rect.y), cv::Point(rect.x + rect.width, rect.y + rect.height), color, 3);

                std::ostringstream label_ss;
                label_ss << class_names[c] << ": " << std::fixed << std::setprecision(2) << scores[c][idx];
                auto label = label_ss.str();

                int baseline;
                auto label_bg_sz = cv::getTextSize(label.c_str(), cv::FONT_HERSHEY_COMPLEX_SMALL, 1, 1, &baseline);
                cv::rectangle(frame, cv::Point(rect.x, rect.y - label_bg_sz.height - baseline - 10), cv::Point(rect.x + label_bg_sz.width, rect.y), color, cv::FILLED);
                cv::putText(frame, label.c_str(), cv::Point(rect.x, rect.y - baseline - 5), cv::FONT_HERSHEY_COMPLEX_SMALL, 1, cv::Scalar(0, 0, 0));
            }
        }

        auto total_end = std::chrono::steady_clock::now();

        float inference_fps = std::chrono::duration_cast<std::chrono::microseconds>(dnn_end - dnn_start).count() / 1000.0;
        //std::cout << "模型推理时间为:" << inference_fps << " ms" << std::endl;
        float total_fps = std::chrono::duration_cast<std::chrono::microseconds>(total_end - total_start).count() / 1000.0;
        //std::cout << "单帧总耗费时间为:" << total_fps << " ms" << std::endl;
        std::ostringstream stats_ss;
        stats_ss << std::fixed << std::setprecision(2);
        stats_ss << "Inference FPS: " << 1000.0 / inference_fps << ", Total FPS: " << 1000.0 / total_fps;
        runtimes.push_back(total_fps);
        auto stats = stats_ss.str();
        int baseline;
        auto stats_bg_sz = cv::getTextSize(stats.c_str(), cv::FONT_HERSHEY_COMPLEX_SMALL, 1, 1, &baseline);
        cv::rectangle(frame, cv::Point(0, 0), cv::Point(stats_bg_sz.width, stats_bg_sz.height + 10), cv::Scalar(0, 0, 0), cv::FILLED);
        cv::putText(frame, stats.c_str(), cv::Point(0, stats_bg_sz.height + 5), cv::FONT_HERSHEY_COMPLEX_SMALL, 1, cv::Scalar(255, 255, 255));
        //cv::namedWindow("output", cv::WindowFlags::WINDOW_AUTOSIZE);
        //cv::imshow("output", frame);
    }
    auto sum = std::accumulate(std::begin(runtimes), std::end(runtimes), 0.0f);
    auto squared_sum = std::inner_product(std::begin(runtimes), std::end(runtimes), std::begin(runtimes), 0.0f);

    auto min = *std::min_element(std::begin(runtimes), std::end(runtimes));
    auto max = *std::max_element(std::begin(runtimes), std::end(runtimes));
    auto mean = sum / runtimes.size();
    auto stddev = std::sqrt(squared_sum / runtimes.size() - mean * mean);

    std::cout << '[' << "yolov4-tiny" << "]" << '\n'
        << "\tinit >> " << std::chrono::duration_cast<std::chrono::microseconds>(init_end - init_start).count() / 1000.0 << "ms" << '\n'
        << "\tinference >> " << "min = " << min << "ms, max = " << max << "ms, mean = " << mean << "ms, stddev = " << stddev << "ms" << std::endl;

    //cv::destroyAllWindows();
    return 0;
}

Related models and video downloads.

Download: https://pan.baidu.com/s/1wpXBbdtJMUrYULAelglIiw?pwd=x67k

Issue submission checklist

opencv-alalek commented 10 months ago

Please avoid posting of text information via screenshoots. It is ridiculous and blocks comparison of its output.

As first step of investigation we should align build flags first (CPU_BASELINE at least). opencv-openvino and opencv-python are different distribution channels. So we need to compare packages from the same distribution channel (e.g. compare opencv-python==4.5.5 vs opencv-python==4.8.0).

ZJDATY commented 10 months ago

Please avoid posting of text information via screenshoots. It is ridiculous and blocks comparison of its output.

As first step of investigation we should align build flags first (CPU_BASELINE at least). opencv-openvino and opencv-python are different distribution channels. So we need to compare packages from the same distribution channel (e.g. compare opencv-python==4.5.5 vs opencv-python==4.8.0).

I will upload the output log again and hope you can delete this tag. My output log contains the build version (CPU_BASELINE). OpenCV 4.8.0 and OpenCV 4.5.5-openvino are both officially built, and I think they can be compared. If the older version has a better and faster build method, why not choose it.

zihaomu commented 10 months ago

Hi @ZJDATY, I have tested your model with two different opencv-python on my Mac intel i9 cpu.

And the result is the following: Test Python version=3.8

For opencv-python 4.5.5.64 the min time is forward time = 21.917058 ms. For opencv-python 4.8.0.74 the min time is forward time = 21.500704 ms. The two different versions takes almost the same time.

The test script is the following:

import cv2 as cv
import numpy as np

net = cv.dnn.readNet("yolov4/yolov4-tiny.cfg", "yolov4/yolov4-tiny.weights")

t = cv.TickMeter()
t.reset()

input = np.random.rand(1, 3, 416, 416)
net.setInput(input)

minT = 100000
for i in range(100):
    t.start()
    net.forward()
    t.stop()
    timeN = t.getTimeMilli()
    if timeN < minT:
        minT = timeN
    t.reset()

print("forward time = ", minT, "ms")

Can you do the same test on you site?

ZJDATY commented 10 months ago

Hi @ZJDATY, I have tested your model with two different opencv-python on my Mac intel i9 cpu.

And the result is the following: Test Python version=3.8

For opencv-python 4.5.5.64 the min time is forward time = 21.917058 ms. For opencv-python 4.8.0.74 the min time is forward time = 21.500704 ms. The two different versions takes almost the same time.

The test script is the following:

import cv2 as cv
import numpy as np

net = cv.dnn.readNet("yolov4/yolov4-tiny.cfg", "yolov4/yolov4-tiny.weights")

t = cv.TickMeter()
t.reset()

input = np.random.rand(1, 3, 416, 416)
net.setInput(input)

minT = 100000
for i in range(100):
    t.start()
    net.forward()
    t.stop()
    timeN = t.getTimeMilli()
    if timeN < minT:
        minT = timeN
    t.reset()

print("forward time = ", minT, "ms")

Can you do the same test on you site?

Hi@zihaomu ,I am happy to assist you in testing.I don't use Python very much, so I wrote a new one using C++. My environment is Win11, and the CPU is I7-9700. Here are my test results.

#include <opencv2/opencv.hpp>

int main()
{
    cv::TickMeter *t = new cv::TickMeter();
    t->reset();
    auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
    cv::Mat frame(416, 416, CV_32FC3), blob;
    cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);
    net.setInput(blob);
    double minT = 100000;
    for (int i = 0; i < 100; i++)
    {
        t->start();
        net.forward();
        t->stop();
        double timeN = t->getAvgTimeMilli();
        if (timeN < minT)
            minT = timeN;
        t->reset();
    }
    std::cout << "forward time = " << minT << std::endl;
    return 0;
}

image

For opencv4.8.0 the min time is forward time = 30.2679 ms. 1st 24.5836 2st 25.1356 3st 25.9139 4st 25.0378 5st

image

For opencv4.5.5-openvino the min time is forward time = 10.6107 ms. 1st 10.4434 2st 10.3903 3st 10.3375 4st 10.4401 5st

ZJDATY commented 10 months ago

@zihaomu How crazy! The inference time has increased by 150%. If you need it, I can provide my remote desktop for you to test.

zihaomu commented 10 months ago

@ZJDATY Can you test it by the opencv-python?

ZJDATY commented 10 months ago

@ZJDATY Can you test it by the opencv-python?

Hi,@zihaomu , I will give it a try. I have a limited understanding of Python. But I want to know why Python testing is necessary. The actual deployment still requires C++. Shouldn't we use a production environment for testing. If the testing time in Python is similar, what can it say? Compiled opencv4.5.5 with openvino, did you use any special optimization measures?

zihaomu commented 10 months ago

Hi @ZJDATY, I just test it on my Windows 10, i7-9750. And indeed, I can not reproduce your result. opencv-python is compiled by OpenCV C++ source code.

For opencv-python 4.5.5.64 the min time of forward is time = 36.2916 ms. For opencv-python 4.6.0.66 the min time of forward is time = 34.8292 ms. For opencv-python 4.8.0.74 the min time of forward is time = 34.4077 ms.

ZJDATY commented 10 months ago

Hi @ZJDATY, I just test it on my Windows 10, i7-9750. And indeed, I can not reproduce your result. opencv-python is compiled by OpenCV C++ source code.

For opencv-python 4.5.5.64 the min time of forward is time = 36.2916 ms. For opencv-python 4.6.0.66 the min time of forward is time = 34.8292 ms. For opencv-python 4.8.0.74 the min time of forward is time = 34.4077 ms.

So is it opencv455 openvino with powerful optimization measures? Can you use the C++program I wrote for testing?

zihaomu commented 10 months ago

Please make sure you are using the OpenCV DNN of CPU_backend, instead of openvino or otherwise. Comparing different backends is not fair.

ZJDATY commented 10 months ago

Please make sure you are using the OpenCV DNN of CPU_backend, instead of openvino or otherwise. Comparing different backends is not fair.

I understand what you mean. If these two sentences are added, is the effect comparable. If you compile opencv480 openvino yourself. So the testing effect should be similar to opencv-455 openvino.

net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
ZJDATY commented 10 months ago

Hi @zihaomu ,I have modified the program.

#include <opencv2/opencv.hpp>

int main()
{
    cv::TickMeter *t = new cv::TickMeter();
    t->reset();
    auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
    net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
    net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
    cv::Mat frame(416, 416, CV_32FC3), blob;
    cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);
    net.setInput(blob);
    double minT = 100000;
    for (int i = 0; i < 100; i++)
    {
        t->start();
        net.forward();
        t->stop();
        double timeN = t->getAvgTimeMilli();
        if (timeN < minT)
            minT = timeN;
        t->reset();
    }
    std::cout << "forward time = " << minT << std::endl;
    delete t;
    return 0;
}

Opencv455-openvino is 16.2936ms. Opencv480 is 26.1167ms. There is still a gap of nearly 10ms.

ZJDATY commented 10 months ago

For comparative testing, I also compiled a version of opencv480 openvino myself. Using the same code above, its output is as follows:

 forward time = 24.7357
forward time = 24.7381

This is opencv480 with openvino.

zihaomu commented 10 months ago

No idea

zihaomu commented 10 months ago

Have you compare Opencv455 without openvino with opencv480

zihaomu commented 10 months ago

Can you disable the Inference Engine to rebuild and test it again? I just recheck you issue cmake file. And found that 4.5 compiled with Inference Engine, and 4.8 without InferenEngine.

ZJDATY commented 10 months ago

Have you compare Opencv455 without openvino with opencv480

I will.

ZJDATY commented 10 months ago

Can you disable the Inference Engine to rebuild and test it again? I just recheck you issue cmake file. And found that 4.5 compiled with Inference Engine, and 4.8 without InferenEngine.

What I am showing is the officially compiled opencv455-openvino and opencv480, and I am afraid you may say that there is a problem with my compilation. In fact, I have also compiled several versions with openvino myself. Below, I have provided the compilation information, which is opencv480-openvino. Its time is similar to the time without openvino2022.1, but the time difference between it and opencv455 openvino is still significant.

General configuration for OpenCV 4.8.0 =====================================
  Version control:               unknown

  Extra modules:
    Location (extra):            D:/opencv/opencv_contrib-4.8.0/modules
    Version control (extra):     unknown

  Platform:
    Timestamp:                   2023-06-29T13:58:22Z
    Host:                        Windows 10.0.22621 AMD64
    CMake:                       3.26.4
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1929
    Configuration:               Debug Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (16 files):         + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (7 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (35 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (5 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe  (ver 19.29.30148.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP -openmp  /MD /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP -openmp  /MDd /Zi /Ob0 /Od /RTC1
    C Compiler:                  D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast     /MP -openmp   /MD /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast     /MP -openmp /MDd /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64  delayimp.lib /DELAYLOAD:cublas64_11.dll /DELAYLOAD:cublasLt64_11.dll /DELAYLOAD:cudnn64_8.dll /DELAYLOAD:cudnn_adv_infer64_8.dll /DELAYLOAD:cudnn_adv_train64_8.dll /DELAYLOAD:cudnn_cnn_infer64_8.dll /DELAYLOAD:cudnn_cnn_train64_8.dll /DELAYLOAD:cudnn_ops_infer64_8.dll /DELAYLOAD:cudnn_ops_train64_8.dll /DELAYLOAD:cufft64_10.dll /DELAYLOAD:cufftw64_10.dll /DELAYLOAD:cuinj64_117.dll /DELAYLOAD:curand64_10.dll /DELAYLOAD:cusolver64_11.dll /DELAYLOAD:cusolverMg64_11.dll /DELAYLOAD:cusparse64_11.dll /DELAYLOAD:nppc64_11.dll /DELAYLOAD:nppial64_11.dll /DELAYLOAD:nppicc64_11.dll /DELAYLOAD:nppidei64_11.dll /DELAYLOAD:nppif64_11.dll /DELAYLOAD:nppig64_11.dll /DELAYLOAD:nppim64_11.dll /DELAYLOAD:nppist64_11.dll /DELAYLOAD:nppisu64_11.dll /DELAYLOAD:nppitc64_11.dll /DELAYLOAD:npps64_11.dll /DELAYLOAD:nvblas64_11.dll /DELAYLOAD:nvjpeg64_11.dll /DELAYLOAD:nvrtc-builtins64_117.dll /DELAYLOAD:nvrtc64_112_0.dll /DELAYLOAD:zlibwapi.dll /DELAYLOAD:nvcuda.dll /DELAYLOAD:nvml.dll /IGNORE:4199  /DELAYLOAD:opencv_cudev480.dll /DELAYLOAD:opencv_cudaarithm480.dll /DELAYLOAD:opencv_flann480.dll /DELAYLOAD:opencv_imgproc480.dll /DELAYLOAD:opencv_intensity_transform480.dll /DELAYLOAD:opencv_ml480.dll /DELAYLOAD:opencv_phase_unwrapping480.dll /DELAYLOAD:opencv_plot480.dll /DELAYLOAD:opencv_quality480.dll /DELAYLOAD:opencv_reg480.dll /DELAYLOAD:opencv_surface_matching480.dll /DELAYLOAD:opencv_cudafilters480.dll /DELAYLOAD:opencv_cudaimgproc480.dll /DELAYLOAD:opencv_cudawarping480.dll /DELAYLOAD:opencv_dnn480.dll /DELAYLOAD:opencv_dnn_superres480.dll /DELAYLOAD:opencv_features2d480.dll /DELAYLOAD:opencv_fuzzy480.dll /DELAYLOAD:opencv_hfs480.dll /DELAYLOAD:opencv_img_hash480.dll /DELAYLOAD:opencv_imgcodecs480.dll /DELAYLOAD:opencv_line_descriptor480.dll /DELAYLOAD:opencv_photo480.dll /DELAYLOAD:opencv_saliency480.dll /DELAYLOAD:opencv_text480.dll /DELAYLOAD:opencv_videoio480.dll /DELAYLOAD:opencv_xphoto480.dll /DELAYLOAD:opencv_calib3d480.dll /DELAYLOAD:opencv_cudacodec480.dll /DELAYLOAD:opencv_cudafeatures2d480.dll /DELAYLOAD:opencv_cudastereo480.dll /DELAYLOAD:opencv_datasets480.dll /DELAYLOAD:opencv_highgui480.dll /DELAYLOAD:opencv_mcc480.dll /DELAYLOAD:opencv_objdetect480.dll /DELAYLOAD:opencv_rapid480.dll /DELAYLOAD:opencv_rgbd480.dll /DELAYLOAD:opencv_shape480.dll /DELAYLOAD:opencv_structured_light480.dll /DELAYLOAD:opencv_ts480.dll /DELAYLOAD:opencv_video480.dll /DELAYLOAD:opencv_wechat_qrcode480.dll /DELAYLOAD:opencv_xfeatures2d480.dll /DELAYLOAD:opencv_ximgproc480.dll /DELAYLOAD:opencv_xobjdetect480.dll /DELAYLOAD:opencv_aruco480.dll /DELAYLOAD:opencv_bgsegm480.dll /DELAYLOAD:opencv_bioinspired480.dll /DELAYLOAD:opencv_ccalib480.dll /DELAYLOAD:opencv_cudabgsegm480.dll /DELAYLOAD:opencv_cudalegacy480.dll /DELAYLOAD:opencv_cudaobjdetect480.dll /DELAYLOAD:opencv_dnn_objdetect480.dll /DELAYLOAD:opencv_dpm480.dll /DELAYLOAD:opencv_face480.dll /DELAYLOAD:opencv_gapi480.dll /DELAYLOAD:opencv_optflow480.dll /DELAYLOAD:opencv_stitching480.dll /DELAYLOAD:opencv_tracking480.dll /DELAYLOAD:opencv_cudaoptflow480.dll /DELAYLOAD:opencv_stereo480.dll /DELAYLOAD:opencv_superres480.dll /DELAYLOAD:opencv_videostab480.dll /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64  delayimp.lib /DELAYLOAD:cublas64_11.dll /DELAYLOAD:cublasLt64_11.dll /DELAYLOAD:cudnn64_8.dll /DELAYLOAD:cudnn_adv_infer64_8.dll /DELAYLOAD:cudnn_adv_train64_8.dll /DELAYLOAD:cudnn_cnn_infer64_8.dll /DELAYLOAD:cudnn_cnn_train64_8.dll /DELAYLOAD:cudnn_ops_infer64_8.dll /DELAYLOAD:cudnn_ops_train64_8.dll /DELAYLOAD:cufft64_10.dll /DELAYLOAD:cufftw64_10.dll /DELAYLOAD:cuinj64_117.dll /DELAYLOAD:curand64_10.dll /DELAYLOAD:cusolver64_11.dll /DELAYLOAD:cusolverMg64_11.dll /DELAYLOAD:cusparse64_11.dll /DELAYLOAD:nppc64_11.dll /DELAYLOAD:nppial64_11.dll /DELAYLOAD:nppicc64_11.dll /DELAYLOAD:nppidei64_11.dll /DELAYLOAD:nppif64_11.dll /DELAYLOAD:nppig64_11.dll /DELAYLOAD:nppim64_11.dll /DELAYLOAD:nppist64_11.dll /DELAYLOAD:nppisu64_11.dll /DELAYLOAD:nppitc64_11.dll /DELAYLOAD:npps64_11.dll /DELAYLOAD:nvblas64_11.dll /DELAYLOAD:nvjpeg64_11.dll /DELAYLOAD:nvrtc-builtins64_117.dll /DELAYLOAD:nvrtc64_112_0.dll /DELAYLOAD:zlibwapi.dll /DELAYLOAD:nvcuda.dll /DELAYLOAD:nvml.dll /IGNORE:4199  /DELAYLOAD:opencv_cudev480.dll /DELAYLOAD:opencv_cudaarithm480.dll /DELAYLOAD:opencv_flann480.dll /DELAYLOAD:opencv_imgproc480.dll /DELAYLOAD:opencv_intensity_transform480.dll /DELAYLOAD:opencv_ml480.dll /DELAYLOAD:opencv_phase_unwrapping480.dll /DELAYLOAD:opencv_plot480.dll /DELAYLOAD:opencv_quality480.dll /DELAYLOAD:opencv_reg480.dll /DELAYLOAD:opencv_surface_matching480.dll /DELAYLOAD:opencv_cudafilters480.dll /DELAYLOAD:opencv_cudaimgproc480.dll /DELAYLOAD:opencv_cudawarping480.dll /DELAYLOAD:opencv_dnn480.dll /DELAYLOAD:opencv_dnn_superres480.dll /DELAYLOAD:opencv_features2d480.dll /DELAYLOAD:opencv_fuzzy480.dll /DELAYLOAD:opencv_hfs480.dll /DELAYLOAD:opencv_img_hash480.dll /DELAYLOAD:opencv_imgcodecs480.dll /DELAYLOAD:opencv_line_descriptor480.dll /DELAYLOAD:opencv_photo480.dll /DELAYLOAD:opencv_saliency480.dll /DELAYLOAD:opencv_text480.dll /DELAYLOAD:opencv_videoio480.dll /DELAYLOAD:opencv_xphoto480.dll /DELAYLOAD:opencv_calib3d480.dll /DELAYLOAD:opencv_cudacodec480.dll /DELAYLOAD:opencv_cudafeatures2d480.dll /DELAYLOAD:opencv_cudastereo480.dll /DELAYLOAD:opencv_datasets480.dll /DELAYLOAD:opencv_highgui480.dll /DELAYLOAD:opencv_mcc480.dll /DELAYLOAD:opencv_objdetect480.dll /DELAYLOAD:opencv_rapid480.dll /DELAYLOAD:opencv_rgbd480.dll /DELAYLOAD:opencv_shape480.dll /DELAYLOAD:opencv_structured_light480.dll /DELAYLOAD:opencv_ts480.dll /DELAYLOAD:opencv_video480.dll /DELAYLOAD:opencv_wechat_qrcode480.dll /DELAYLOAD:opencv_xfeatures2d480.dll /DELAYLOAD:opencv_ximgproc480.dll /DELAYLOAD:opencv_xobjdetect480.dll /DELAYLOAD:opencv_aruco480.dll /DELAYLOAD:opencv_bgsegm480.dll /DELAYLOAD:opencv_bioinspired480.dll /DELAYLOAD:opencv_ccalib480.dll /DELAYLOAD:opencv_cudabgsegm480.dll /DELAYLOAD:opencv_cudalegacy480.dll /DELAYLOAD:opencv_cudaobjdetect480.dll /DELAYLOAD:opencv_dnn_objdetect480.dll /DELAYLOAD:opencv_dpm480.dll /DELAYLOAD:opencv_face480.dll /DELAYLOAD:opencv_gapi480.dll /DELAYLOAD:opencv_optflow480.dll /DELAYLOAD:opencv_stitching480.dll /DELAYLOAD:opencv_tracking480.dll /DELAYLOAD:opencv_cudaoptflow480.dll /DELAYLOAD:opencv_stereo480.dll /DELAYLOAD:opencv_superres480.dll /DELAYLOAD:opencv_videostab480.dll /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:          cudart_static.lib nppc.lib nppial.lib nppicc.lib nppidei.lib nppif.lib nppig.lib nppim.lib nppist.lib nppisu.lib nppitc.lib npps.lib cublas.lib cudnn.lib cufft.lib -LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7/lib/x64
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
    Disabled:                    java_bindings_generator js_bindings_generator python_bindings_generator python_tests world
    Disabled by dependency:      -
    Unavailable:                 alphamat cvv freetype hdf java julia matlab ovis python2 python3 sfm viz
    Applications:                apps
    Documentation:               NO
    Non-free algorithms:         YES

  Windows RT support:            NO

  GUI:                           WIN32UI
    Win32 UI:                    YES
    OpenGL support:              YES (opengl32 glu32)

  Media I/O:
    ZLib:                        build (ver 1.2.13)
    JPEG:                        build-libjpeg-turbo (ver 2.1.3-62)
      SIMD Support Request:      YES
      SIMD Support:              NO
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         build (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build (ver 2.5.0)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES

  Parallel framework:            TBB (ver 2020.2 interface 11102)

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2021.8 [2021.8.0]
           at:                   D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2021.8.0)
              at:                D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/iw
    Lapack:                      YES (D:/opencv/OpenBLAS/lib/libopenblas.lib)
    OpenVINO:                    YES (2022.1.0)
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)
    Flatbuffers:                 builtin/3rdparty (23.5.9)

  NVIDIA CUDA:                   YES (ver 11.7, CUFFT CUBLAS FAST_MATH)
    NVIDIA GPU arch:             61 70 75 86
    NVIDIA PTX archs:

  cuDNN:                         YES (ver 8.6.0)

  OpenCL:                        YES (NVD3D11)
    Include path:                D:/opencv/opencv-4.8.0/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  ONNX:                          YES
    Include path:                D:/opencv/onnxruntime-1.15.1/include/onnxruntime/core/session
    Link libraries:              D:/env/Download/onnxruntime/onnxruntime-win-x64-gpu-1.15.1/lib/onnxruntime.lib

  Python (for build):            D:/ProgramData/Anaconda3/python.exe

  Install to:                    D:/opencv/opencv-4.8.0/build/install
-----------------------------------------------------------------

forward time = 24.4056

D:\vcworkspaces\yolov4_tiny_dnn_demo\x64\Release\dnnspeedtest.exe (进程 16252)已退出,代码为 0。
按任意键关闭此窗口. . .

I not only compiled an opencv480 openvino. CMake has set different compilation configurations, and I have compiled 5 versions. The inference time difference between the versions is not significant, but the results show that the inference time cannot reach the speed of opencv455 openvino, always more than ten milliseconds apart.

zihaomu commented 10 months ago

disable the openvino so that we can make sure dnn was running by cpu instead of openvino. Since the original 4.5 or 4.8 can not achieve yolov4 tiny model about 10 ms only by cpu. That's value is wired. openvino has special optimize for intel cpu,and 10ms for openvino is reasonable value.

zihaomu commented 10 months ago

To further investigate the speed issue. The following table result would be helpful.

Speed of yolov4-tiny OpenCV 4.5 opencv 4.8
net.setPreferableBackend(DNN_BACKEND_INFERENCE_ENGINE);
net.setPreferableBackend(0);

Looking forward to your reply.

ZJDATY commented 10 months ago

@zihaomu I just completed the compilation of opencv455 VC16. Here are my latest test results. Currently, there are four versions available on my end: opencv455 and opencv480, as well as four versions with openvino each.

Speed of yolov4-tiny | OpenCV 4.5 | opencv 4.8 | OpenCV 4.5-openvino | opencv 4.8-openvino net.setPreferableBackend(cv::dnn::DNN_BACKEND_INFERENCE_ENGINE); | - | - | 10.4434 | 9.501 net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV); | 15.8422 | 25.1356 | 16.2936 | 24.7036

image

zihaomu commented 10 months ago

Thanks for your work. I will take a look at the details. And do the layer-by-layer speed comparison.

zihaomu commented 10 months ago

BTW, what's your cpu details info?

ZJDATY commented 10 months ago

BTW, what's your cpu details info?

I mentioned it earlier. It's I7-9700.

image

zihaomu commented 10 months ago

I still cannot reproduce the result on my Intel i7-9750 CPU laptop, windows 10, VS2019. The speed is the following:

I have a bit of doubt about how you are able to get 20ms on your i7-9700 with just the CPU.

ZJDATY commented 10 months ago

I still cannot reproduce the result on my Intel i7-9750 CPU laptop, windows 10, VS2019. The speed is the following:

  • opencv-4.8 takes about 50 ms.
  • opencv-4.5.5 takes about 45 ms.

I have a bit of doubt about how you are able to get 20ms on your i7-9700 with just the CPU.

@zihaomu There are a few differences, mine is the I7-9700 on the desktop. Not a laptop, using the WIN11 system and maintaining the latest updates.

zihaomu commented 10 months ago

And I have tested it with my AMD 5600X desktop, it should be faster than i7-9700. And it takes about 40 ms. Maybe I miss something. Can you test the single-core performance on your site?

Another test is the following:

while(1)
{
net.forward(detections, output_names);
}

To test if GPU usage goes up. I'm concerned it actually using the GPU, by opencl or cuda.

zihaomu commented 10 months ago

@WanliZhong Please take a look.

opencv-alalek commented 10 months ago

@ZJDATY Consider using OpenCV performance tests. There is test case for yolov4-tiny here: https://github.com/opencv/opencv/blob/4.8.0/modules/dnn/perf/perf_net.cpp#L242

Unfortunately used model in 4.8.0 and 4.5.5 are not the same (details #23008). Need to restore test (perf_net.cpp) from 4.5.5 for correct comparison (however timings are similar on my machine, perhaps only weights are changed).

There are could be many reasons for performance changes. Consider using --perf_threads=1 to disable multi-threading during the test.

I don't see degradation on Linux (GCC 12) with i7-12700K:

Name of Test 455-1th 480-1th-sametestdata 480-1th-sametestdata vs 455-1th (x-factor)
YOLOv4_tiny::DNNTestNetwork::OCV/CPU 65.478 40.288 1.63
Name of Test 455-Nth 480-Nth-sametestdata 480-Nth-sametestdata vs 455-Nth (x-factor)
YOLOv4_tiny::DNNTestNetwork::OCV/CPU 12.280 9.111 1.35

Used commands:

$ ./bin/opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --gtest_output=xml:../perf/dnn_23911/455-Nth.xml
...
$ ./bin/opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --perf_threads=1 --gtest_output=xml:../perf/dnn_23911/480-1th-sametestdata.xml

$ python $OPENCV_SRC_DIR/modules/ts/misc/summary.py -m median ../perf/dnn_23911/{455-1th,480-1th-sametestdata}.xml -o markdown
$ python $OPENCV_SRC_DIR/modules/ts/misc/summary.py -m median ../perf/dnn_23911/{455-Nth,480-Nth-sametestdata}.xml -o markdown

BTW, need to specify OPENCV_TEST_DATA_PATH="<opencv_extra>/testdata" environment variable and run download script (pass YOLOv4-tiny parameter to download one model only)

ZJDATY commented 10 months ago

I still cannot reproduce the result on my Intel i7-9750 CPU laptop, windows 10, VS2019. The speed is the following:

  • opencv-4.8 takes about 50 ms.
  • opencv-4.5.5 takes about 45 ms.

I have a bit of doubt about how you are able to get 20ms on your i7-9700 with just the CPU.

@zihaomu @asmorkalov Did you use this program to test the time?

#include <opencv2/opencv.hpp>

int main()
{
    cv::TickMeter *t = new cv::TickMeter();
    t->reset();
    auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
    net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
    net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
    cv::Mat frame(416, 416, CV_32FC3), blob;
    cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);
    net.setInput(blob);
    double minT = 100000;
    for (int i = 0; i < 100; i++)
    {
        t->start();
        net.forward();
        t->stop();
        double timeN = t->getAvgTimeMilli();
        if (timeN < minT)
            minT = timeN;
        t->reset();
    }
    std::cout << cv::getBuildInformation() << std::endl;
    std::cout << "forward time = " << minT << std::endl;
    delete t;
    return 0;
}

Here, I recorded a video.

link: https://pan.baidu.com/s/1pA6P2v28IxZtz6PIC446gw?pwd=jaq8

ZJDATY commented 10 months ago

And I have tested it with my AMD 5600X desktop, it should be faster than i7-9700. And it takes about 40 ms. Maybe I miss something. Can you test the single-core performance on your site?

Another test is the following:

while(1)
{
net.forward(detections, output_names);
}

To test if GPU usage goes up. I'm concerned it actually using the GPU, by opencl or cuda.

I tested and found that the GPU did not have any new resource consumption.

ZJDATY commented 10 months ago

@ZJDATY Consider using OpenCV performance tests. There is test case for yolov4-tiny here: https://github.com/opencv/opencv/blob/4.8.0/modules/dnn/perf/perf_net.cpp#L242

Unfortunately used model in 4.8.0 and 4.5.5 are not the same (details #23008). Need to restore test (perf_net.cpp) from 4.5.5 for correct comparison (however timings are similar on my machine, perhaps only weights are changed).

There are could be many reasons for performance changes. Consider using --perf_threads=1 to disable multi-threading during the test.

I don't see degradation on Linux (GCC 12) with i7-12700K:

  • 1 thread:

Name of Test 455-1th 480-1th-sametestdata 480-1th-sametestdata vs 455-1th (x-factor) YOLOv4_tiny::DNNTestNetwork::OCV/CPU 65.478 40.288 1.63

  • N threads (default=20):

Name of Test 455-Nth 480-Nth-sametestdata 480-Nth-sametestdata vs 455-Nth (x-factor) YOLOv4_tiny::DNNTestNetwork::OCV/CPU 12.280 9.111 1.35 Used commands:

$ ./bin/opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --gtest_output=xml:../perf/dnn_23911/455-Nth.xml
...
$ ./bin/opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --perf_threads=1 --gtest_output=xml:../perf/dnn_23911/480-1th-sametestdata.xml

$ python $OPENCV_SRC_DIR/modules/ts/misc/summary.py -m median ../perf/dnn_23911/{455-1th,480-1th-sametestdata}.xml -o markdown
$ python $OPENCV_SRC_DIR/modules/ts/misc/summary.py -m median ../perf/dnn_23911/{455-Nth,480-Nth-sametestdata}.xml -o markdown

BTW, need to specify OPENCV_TEST_DATA_PATH="<opencv_extra>/testdata" environment variable and run download script (pass YOLOv4-tiny parameter to download one model only)

@zihaomu Please forgive my ignorance, I don't know how to set parameters, this statement seems to have no result.I placed the yolo folder under the same level directory. Like this: image image

image

opencv_perf_dnn --gtest_filter=./yolo/yolov4-tiny.weights --gtest_output=xml:result.xml

Time compensation is 0
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.5.5
OpenCV VCS version: unknown
Build type: Debug Release
WARNING: build value differs from runtime: Release
Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe  (ver 19.29.30148.0)
Parallel framework: tbb (nthreads=8)
CPU features: SSE SSE2 SSE3 *SSE4.1 *SSE4.2 *FP16 *AVX *AVX2 *AVX512-SKX?
Intel(R) IPP version: ippIP AVX2 (l9) 2020.0.0 Gold (-) Oct 21 2019
Intel(R) IPP features code: 0x8000
OpenCL Platforms:
    NVIDIA CUDA
        dGPU: NVIDIA TITAN V (OpenCL 3.0 CUDA)
    Intel(R) OpenCL HD Graphics
        iGPU: Intel(R) UHD Graphics 630 (OpenCL 3.0 NEO )
Current OpenCL device:
    Type = dGPU
    Name = NVIDIA TITAN V
    Version = OpenCL 3.0 CUDA
    Driver version = 536.40
    Address bits = 64
    Compute units = 80
    Max work group size = 1024
    Local memory size = 48 KB
    Max memory allocation size = 2 GB 1023 MB 880 KB
    Double support = Yes
    Half support = No
    Host unified memory = No
    Device extensions:
        cl_khr_global_int32_base_atomics
        cl_khr_global_int32_extended_atomics
        cl_khr_local_int32_base_atomics
        cl_khr_local_int32_extended_atomics
        cl_khr_fp64
        cl_khr_3d_image_writes
        cl_khr_byte_addressable_store
        cl_khr_icd
        cl_khr_gl_sharing
        cl_nv_compiler_options
        cl_nv_device_attribute_query
        cl_nv_pragma_unroll
        cl_nv_d3d10_sharing
        cl_khr_d3d10_sharing
        cl_nv_d3d11_sharing
        cl_nv_copy_opts
        cl_nv_create_buffer
        cl_khr_int64_base_atomics
        cl_khr_int64_extended_atomics
        cl_khr_device_uuid
        cl_khr_pci_bus_info
        cl_khr_external_semaphore
        cl_khr_external_memory
        cl_khr_external_semaphore_win32
        cl_khr_external_memory_win32
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 1
    Preferred vector width short = 1
    Preferred vector width int = 1
    Preferred vector width long = 1
    Preferred vector width float = 1
    Preferred vector width double = 1
    Preferred vector width half = 0
Note: Google Test filter = ./yolo/yolov4-tiny.weights
[==========] Running 0 tests from 0 test cases.
[==========] 0 tests from 0 test cases ran. (0 ms total)
[  PASSED  ] 0 tests.

opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --gtest_output=xml:result.xml

Time compensation is 0
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.5.5
OpenCV VCS version: unknown
Build type: Debug Release
WARNING: build value differs from runtime: Release
Compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe  (ver 19.29.30148.0)
Parallel framework: tbb (nthreads=8)
CPU features: SSE SSE2 SSE3 *SSE4.1 *SSE4.2 *FP16 *AVX *AVX2 *AVX512-SKX?
Intel(R) IPP version: ippIP AVX2 (l9) 2020.0.0 Gold (-) Oct 21 2019
Intel(R) IPP features code: 0x8000
OpenCL Platforms:
    NVIDIA CUDA
        dGPU: NVIDIA TITAN V (OpenCL 3.0 CUDA)
    Intel(R) OpenCL HD Graphics
        iGPU: Intel(R) UHD Graphics 630 (OpenCL 3.0 NEO )
Current OpenCL device:
    Type = dGPU
    Name = NVIDIA TITAN V
    Version = OpenCL 3.0 CUDA
    Driver version = 536.40
    Address bits = 64
    Compute units = 80
    Max work group size = 1024
    Local memory size = 48 KB
    Max memory allocation size = 2 GB 1023 MB 880 KB
    Double support = Yes
    Half support = No
    Host unified memory = No
    Device extensions:
        cl_khr_global_int32_base_atomics
        cl_khr_global_int32_extended_atomics
        cl_khr_local_int32_base_atomics
        cl_khr_local_int32_extended_atomics
        cl_khr_fp64
        cl_khr_3d_image_writes
        cl_khr_byte_addressable_store
        cl_khr_icd
        cl_khr_gl_sharing
        cl_nv_compiler_options
        cl_nv_device_attribute_query
        cl_nv_pragma_unroll
        cl_nv_d3d10_sharing
        cl_khr_d3d10_sharing
        cl_nv_d3d11_sharing
        cl_nv_copy_opts
        cl_nv_create_buffer
        cl_khr_int64_base_atomics
        cl_khr_int64_extended_atomics
        cl_khr_device_uuid
        cl_khr_pci_bus_info
        cl_khr_external_semaphore
        cl_khr_external_memory
        cl_khr_external_semaphore_win32
        cl_khr_external_memory_win32
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 1
    Preferred vector width short = 1
    Preferred vector width int = 1
    Preferred vector width long = 1
    Preferred vector width float = 1
    Preferred vector width double = 1
    Preferred vector width half = 0
Note: Google Test filter = *YOLOv4_tiny*
[==========] Running 12 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 9 tests from Layer_Slice
[ RUN      ] Layer_Slice.YOLOv4_tiny_1/0, where GetParam() = OCV/OCL
[ PERFSTAT ]    (samples=100   mean=0.30   median=0.29   min=0.28   stddev=0.02 (7.1%))
[       OK ] Layer_Slice.YOLOv4_tiny_1/0 (44 ms)
[ RUN      ] Layer_Slice.YOLOv4_tiny_1/1, where GetParam() = OCV/OCL_FP16
[ PERFSTAT ]    (samples=13   mean=0.29   median=0.29   min=0.29   stddev=0.01 (2.4%))
[       OK ] Layer_Slice.YOLOv4_tiny_1/1 (10 ms)
[ RUN      ] Layer_Slice.YOLOv4_tiny_1/2, where GetParam() = OCV/CPU
[ PERFSTAT ]    (samples=13   mean=0.04   median=0.04   min=0.04   stddev=0.00 (1.4%))
[       OK ] Layer_Slice.YOLOv4_tiny_1/2 (4 ms)
[ RUN      ] Layer_Slice.YOLOv4_tiny_2/0, where GetParam() = OCV/OCL
[ PERFSTAT ]    (samples=10   mean=0.19   median=0.18   min=0.18   stddev=0.00 (2.7%))
[       OK ] Layer_Slice.YOLOv4_tiny_2/0 (5 ms)
[ RUN      ] Layer_Slice.YOLOv4_tiny_2/1, where GetParam() = OCV/OCL_FP16
[ PERFSTAT ]    (samples=79   mean=0.18   median=0.18   min=0.17   stddev=0.01 (3.0%))
[       OK ] Layer_Slice.YOLOv4_tiny_2/1 (18 ms)
[ RUN      ] Layer_Slice.YOLOv4_tiny_2/2, where GetParam() = OCV/CPU
[ PERFSTAT ]    (samples=10   mean=0.02   median=0.02   min=0.02   stddev=0.00 (2.6%))
[       OK ] Layer_Slice.YOLOv4_tiny_2/2 (3 ms)
[ RUN      ] Layer_Slice.YOLOv4_tiny_3/0, where GetParam() = OCV/OCL
[ PERFSTAT ]    (samples=13   mean=0.32   median=0.32   min=0.31   stddev=0.01 (2.1%))
[       OK ] Layer_Slice.YOLOv4_tiny_3/0 (6 ms)
[ RUN      ] Layer_Slice.YOLOv4_tiny_3/1, where GetParam() = OCV/OCL_FP16
[ PERFSTAT ]    (samples=10   mean=0.31   median=0.31   min=0.31   stddev=0.00 (0.6%))
[       OK ] Layer_Slice.YOLOv4_tiny_3/1 (5 ms)
[ RUN      ] Layer_Slice.YOLOv4_tiny_3/2, where GetParam() = OCV/CPU
[ PERFSTAT ]    (samples=13   mean=0.01   median=0.01   min=0.01   stddev=0.00 (1.5%))
[       OK ] Layer_Slice.YOLOv4_tiny_3/2 (1 ms)
[----------] 9 tests from Layer_Slice (100 ms total)

[----------] 3 tests from DNNTestNetwork
[ RUN      ] DNNTestNetwork.YOLOv4_tiny/0, where GetParam() = OCV/OCL
D:\opencv\opencv-4.5.5\modules\ts\src\ts_perf.cpp(2028): error: Failed
Expected: PerfTestBody() doesn't throw an exception.
  Actual: it throws cv::Exception:
  OpenCV(4.5.5) D:\opencv\opencv-4.5.5\modules\ts\src\ts.cpp:1064: error: (-2:Unspecified error) OpenCV tests: Can't find required data file: dnn/dog416.png in function 'cvtest::findData'

params    =     OCV/OCL
termination reason:  unhandled exception
bytesIn   =          0
bytesOut  =          0
samples   =          0 of 100
outliers  =          0
frequency =          0
[  FAILED  ] DNNTestNetwork.YOLOv4_tiny/0, where GetParam() = OCV/OCL (1 ms)
[ RUN      ] DNNTestNetwork.YOLOv4_tiny/1, where GetParam() = OCV/OCL_FP16
D:\opencv\opencv-4.5.5\modules\ts\src\ts_perf.cpp(2028): error: Failed
Expected: PerfTestBody() doesn't throw an exception.
  Actual: it throws cv::Exception:
  OpenCV(4.5.5) D:\opencv\opencv-4.5.5\modules\ts\src\ts.cpp:1064: error: (-2:Unspecified error) OpenCV tests: Can't find required data file: dnn/dog416.png in function 'cvtest::findData'

params    = OCV/OCL_FP16
termination reason:  unhandled exception
bytesIn   =          0
bytesOut  =          0
samples   =          0 of 100
outliers  =          0
frequency =          0
[  FAILED  ] DNNTestNetwork.YOLOv4_tiny/1, where GetParam() = OCV/OCL_FP16 (0 ms)
[ RUN      ] DNNTestNetwork.YOLOv4_tiny/2, where GetParam() = OCV/CPU
D:\opencv\opencv-4.5.5\modules\ts\src\ts_perf.cpp(2028): error: Failed
Expected: PerfTestBody() doesn't throw an exception.
  Actual: it throws cv::Exception:
  OpenCV(4.5.5) D:\opencv\opencv-4.5.5\modules\ts\src\ts.cpp:1064: error: (-2:Unspecified error) OpenCV tests: Can't find required data file: dnn/dog416.png in function 'cvtest::findData'

params    =     OCV/CPU
termination reason:  unhandled exception
bytesIn   =          0
bytesOut  =          0
samples   =          0 of 100
outliers  =          0
frequency =          0
[  FAILED  ] DNNTestNetwork.YOLOv4_tiny/2, where GetParam() = OCV/CPU (1 ms)
[----------] 3 tests from DNNTestNetwork (2 ms total)

[----------] Global test environment tear-down
[==========] 12 tests from 2 test cases ran. (102 ms total)
[  PASSED  ] 9 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] DNNTestNetwork.YOLOv4_tiny/0, where GetParam() = OCV/OCL
[  FAILED  ] DNNTestNetwork.YOLOv4_tiny/1, where GetParam() = OCV/OCL_FP16
[  FAILED  ] DNNTestNetwork.YOLOv4_tiny/2, where GetParam() = OCV/CPU

 3 FAILED TESTS
zihaomu commented 10 months ago

Hi @ZJDATY. Please set the environment variable: OPENCV_TEST_DATA_PATH And the program will find and load the model automatically. Reference: https://github.com/opencv/opencv/wiki/How_to_contribute

WanliZhong commented 10 months ago

@ZJDATY Hi, as I test on windows, the result shows 4.8.0 is faster (use median value)

1 thread: Name of Test 455-1th 480-1th 480-1th vs 455-1th (x-factor)
YOLOv4_tiny::DNNTestNetwork::OCV/CPU 940.120 424.721 2.21
N threads: Name of Test 455-Nth 480-Nth 480-Nth vs 455-Nth (x-factor)
YOLOv4_tiny::DNNTestNetwork::OCV/CPU 358.424 164.512 2.18

Can you print the opencv version by cv::getVersionString() to make sure your version doesn't reverse in your performance test?

ZJDATY commented 10 months ago

Hi,@zihaomu @WanliZhong @asmorkalov @fengyuentau I have completed the test. Opencv4.8 single-core D:\opencv\opencv-4.8.0\build\bin\Release>opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --perf_threads=1 --gtest_output=xml:result.xml Result [ PERFSTAT ] (samples=10 mean=69.02 median=68.82 min=68.08 stddev=0.70 (1.0%))

Opencv4.8 multi-core D:\opencv\opencv-4.8.0\build\bin\Release>opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --gtest_output=xml:result.xml Result [ PERFSTAT ] (samples=15 mean=25.29 median=25.15 min=24.25 stddev=0.73 (2.9%))

Opencv4.5.5 single-core D:\opencv\opencv-4.5.5\build\bin\Release>opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --perf_threads=1 --gtest_output=xml:result.xml Result [ PERFSTAT ] (samples=10 mean=81.12 median=80.85 min=80.38 stddev=0.73 (0.9%)) Opencv4.5 multi-core D:\opencv\opencv-4.5.5\build\bin\Release>opencv_perf_dnn --gtest_filter=*YOLOv4_tiny* --gtest_output=xml:result.xml Result [ PERFSTAT ] (samples=100 mean=17.11 median=16.79 min=16.23 stddev=0.81 (4.7%))

The results show that in single-core reasoning, 4.8 is better than 4.5, but in multi core reasoning, the above results of 25ms and 16ms are consistent with my test results.I don't understand why my multi-core test results are different from yours. I can provide a remote desktop for you to troubleshoot on my computer.

I also tested the inference results using C++programs, including compilation information.

#include <opencv2/opencv.hpp>

int main()
{
    cv::TickMeter *t = new cv::TickMeter();
    t->reset();
    auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
    net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
    net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
    cv::Mat frame(416, 416, CV_32FC3), blob;
    cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);
    net.setInput(blob);
    double minT = 100000;
    for (int i = 0; i < 100; i++)
    {
        t->start();
        net.forward();
        t->stop();
        double timeN = t->getAvgTimeMilli();
        if (timeN < minT)
            minT = timeN;
        t->reset();
    }
    std::cout << cv::getVersionString() << std::endl;
    std::cout << cv::getBuildInformation() << std::endl;
    std::cout << "forward time = " << minT << " ms" << std::endl;
    delete t;
    return 0;
}

455 Result


D:\opencv\opencv-4.5.5\build\bin\Release>D:\opencv\opencv-4.5.5\build\bin\Release\dnnspeedtest.exe
4.5.5

General configuration for OpenCV 4.5.5 =====================================
  Version control:               unknown

  Platform:
    Timestamp:                   2023-07-05T05:04:59Z
    Host:                        Windows 10.0.22621 AMD64
    CMake:                       3.26.4
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1929
    Configuration:               Debug Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (13 files):         + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (4 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (26 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (4 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe  (ver 19.29.30148.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP  /MD /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP  /MDd /Zi /Ob0 /Od /RTC1
    C Compiler:                  D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP   /MD /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP /MDd /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 core dnn flann highgui imgcodecs imgproc ml photo ts video videoio
    Disabled:                    features2d java_bindings_generator js_bindings_generator python_bindings_generator python_tests world
    Disabled by dependency:      calib3d objdetect stitching
    Unavailable:                 gapi java python2 python3
    Applications:                perf_tests
    Documentation:               NO
    Non-free algorithms:         NO

  Windows RT support:            NO

  GUI:                           WIN32UI
    Win32 UI:                    YES

  Media I/O:
    ZLib:                        build (ver 1.2.11)
    JPEG:                        build-libjpeg-turbo (ver 2.1.2-62)
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         build (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build Jasper (ver 1.900.1)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES

  Parallel framework:            Concurrency

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2020.0.0 Gold [2020.0.0]
           at:                   D:/opencv/opencv-4.5.5/build/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2020.0.0)
              at:                D:/opencv/opencv-4.5.5/build/3rdparty/ippicv/ippicv_win/iw
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)

  OpenCL:                        YES (NVD3D11)
    Include path:                D:/opencv/opencv-4.5.5/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python (for build):            D:/ProgramData/Anaconda3/python.exe

  Install to:                    D:/opencv/opencv-4.5.5/build/install
-----------------------------------------------------------------

forward time = 16.1467 ms

480 Result

D:\opencv\opencv-4.8.0\build\bin\Release>D:\opencv\opencv-4.8.0\build\bin\Release\dnnspeedtest.exe
4.8.0

General configuration for OpenCV 4.8.0 =====================================
  Version control:               unknown

  Extra modules:
    Location (extra):            D:/opencv/opencv_contrib-4.8.0/modules
    Version control (extra):     unknown

  Platform:
    Timestamp:                   2023-06-29T13:58:22Z
    Host:                        Windows 10.0.22621 AMD64
    CMake:                       3.26.4
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1929
    Configuration:               Debug Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3 SSSE3 SSE4_1 POPCNT SSE4_2
      requested:                 SSE4_2
    Dispatched code generation:  FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      FP16 (0 files):            + FP16 AVX
      AVX (7 files):             + AVX
      AVX2 (33 files):           + FP16 FMA3 AVX AVX2
      AVX512_SKX (5 files):      + FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe  (ver 19.29.30148.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /MD /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /MDd /Zi /Ob0 /Od /RTC1
    C Compiler:                  D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /MP   /MD /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /MP /MDd /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 calib3d core dnn features2d flann highgui imgcodecs imgproc ml objdetect photo stitching ts video videoio
    Disabled:                    aruco bgsegm bioinspired ccalib datasets dnn_objdetect dnn_superres dpm face fuzzy hfs img_hash intensity_transform java_bindings_generator js_bindings_generator line_descriptor mcc objc_bindings_generator optflow phase_unwrapping plot python_bindings_generator python_tests quality rapid reg rgbd saliency shape stereo structured_light superres surface_matching text tracking videostab wechat_qrcode world xfeatures2d ximgproc xobjdetect xphoto
    Disabled by dependency:      -
    Unavailable:                 alphamat cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv freetype gapi hdf java julia matlab ovis python2 python3 sfm viz
    Applications:                perf_tests
    Documentation:               NO
    Non-free algorithms:         NO

  Windows RT support:            NO

  GUI:                           WIN32UI
    Win32 UI:                    YES

  Media I/O:
    ZLib:                        build (ver 1.2.13)
    JPEG:                        build-libjpeg-turbo (ver 2.1.3-62)
      SIMD Support Request:      YES
      SIMD Support:              NO
    PNG:                         build (ver 1.6.37)
    JPEG 2000:                   build Jasper (ver 1.900.1)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES

  Parallel framework:            TBB (ver 2020.2 interface 11102)

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2021.8 [2021.8.0]
           at:                   D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2021.8.0)
              at:                D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/iw
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)

  OpenCL:                        YES (no extra features)
    Include path:                D:/opencv/opencv-4.8.0/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python (for build):            NO

  Install to:                    D:/opencv/opencv-4.8.0/build/install
-----------------------------------------------------------------

forward time = 24.8977 ms

Looking forward to your reply.

opencv_perf_dnn --gtest_filter=YOLOv4_tiny --perf_threads=N --gtest_output=xml:result.xml OpenCV version Perf_threads Time
455 1 80.8
455 2 42.1
455 4 25.3
455 8 16.7
480 1 69.1
480 2 40.8
480 4 28.9
480 8 25.4
zihaomu commented 10 months ago

@ZJDATY Can you try this patch:https://github.com/opencv/opencv/pull/23952? Maybe it fixs this issue.

opencv-alalek commented 10 months ago

There is difference in build configuration related to used parallel framework:

Parallel framework: Concurrency

vs

Parallel framework: TBB (ver 2020.2 interface 11102)


We need to fix build configuration first (to "compare apples to apples").

ZJDATY commented 10 months ago

There is difference in build configuration related to used parallel framework:

Parallel framework: Concurrency

vs

Parallel framework: TBB (ver 2020.2 interface 11102)

We need to fix build configuration first (to "compare apples to apples").

@zihaomu @WanliZhong I discovered this yesterday, so I have recompiled and tested it, and the results are still the same.

Microsoft Windows [版本 10.0.22621.1992]
(c) Microsoft Corporation。保留所有权利。

C:\Users\ZHANG\Desktop\test>C:\Users\ZHANG\Desktop\test\dnnspeedtest.exe
4.5.5

General configuration for OpenCV 4.5.5 =====================================
  Version control:               unknown

  Platform:
    Timestamp:                   2023-07-05T05:04:59Z
    Host:                        Windows 10.0.22621 AMD64
    CMake:                       3.26.4
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1929
    Configuration:               Debug Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (13 files):         + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (4 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (26 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (4 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe  (ver 19.29.30148.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP  /MD /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP  /MDd /Zi /Ob0 /Od /RTC1
    C Compiler:                  D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP   /MD /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP /MDd /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 core dnn flann highgui imgcodecs imgproc ml photo ts video videoio
    Disabled:                    features2d java_bindings_generator js_bindings_generator python_bindings_generator python_tests world
    Disabled by dependency:      calib3d objdetect stitching
    Unavailable:                 gapi java python2 python3
    Applications:                perf_tests
    Documentation:               NO
    Non-free algorithms:         NO

  Windows RT support:            NO

  GUI:                           WIN32UI
    Win32 UI:                    YES

  Media I/O:
    ZLib:                        build (ver 1.2.11)
    JPEG:                        build-libjpeg-turbo (ver 2.1.2-62)
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         build (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build Jasper (ver 1.900.1)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES

  Parallel framework:            Concurrency

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2020.0.0 Gold [2020.0.0]
           at:                   D:/opencv/opencv-4.5.5/build/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2020.0.0)
              at:                D:/opencv/opencv-4.5.5/build/3rdparty/ippicv/ippicv_win/iw
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)

  OpenCL:                        YES (NVD3D11)
    Include path:                D:/opencv/opencv-4.5.5/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python (for build):            D:/ProgramData/Anaconda3/python.exe

  Install to:                    D:/opencv/opencv-4.5.5/build/install
-----------------------------------------------------------------

forward time = 15.9708 ms
C:\Users\ZHANG\Desktop\test>C:\Users\ZHANG\Desktop\test\dnnspeedtest480.exe
4.8.0

General configuration for OpenCV 4.8.0 =====================================
  Version control:               unknown

  Extra modules:
    Location (extra):            D:/opencv/opencv_contrib-4.8.0/modules
    Version control (extra):     unknown

  Platform:
    Timestamp:                   2023-06-29T13:58:22Z
    Host:                        Windows 10.0.22621 AMD64
    CMake:                       3.26.4
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1929
    Configuration:               Debug Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3 SSSE3
      requested:                 SSSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (13 files):         + SSE4_1
      SSE4_2 (1 files):          + SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (7 files):             + SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (30 files):           + SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (4 files):      + SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe  (ver 19.29.30148.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise      /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /MD /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise      /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /MDd /Zi /Ob0 /Od /RTC1
    C Compiler:                  D:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise      /MP   /MD /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise      /MP /MDd /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 core dnn flann highgui imgcodecs imgproc ml photo ts video videoio
    Disabled:                    aruco bgsegm bioinspired ccalib datasets dnn_objdetect dnn_superres dpm face features2d fuzzy hfs img_hash intensity_transform java_bindings_generator js_bindings_generator line_descriptor mcc optflow phase_unwrapping plot python_bindings_generator python_tests quality rapid reg rgbd saliency shape stereo structured_light superres surface_matching text tracking videostab wechat_qrcode world xfeatures2d ximgproc xobjdetect xphoto
    Disabled by dependency:      calib3d objdetect stitching
    Unavailable:                 alphamat cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv freetype gapi hdf java julia matlab ovis python2 python3 sfm viz
    Applications:                perf_tests
    Documentation:               NO
    Non-free algorithms:         NO

  Windows RT support:            NO

  GUI:                           WIN32UI
    Win32 UI:                    YES

  Media I/O:
    ZLib:                        build (ver 1.2.13)
    JPEG:                        build-libjpeg-turbo (ver 2.1.3-62)
      SIMD Support Request:      YES
      SIMD Support:              NO
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         build (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build Jasper (ver 1.900.1)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES

  Parallel framework:            Concurrency

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2021.8 [2021.8.0]
           at:                   D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2021.8.0)
              at:                D:/opencv/opencv-4.8.0/build/3rdparty/ippicv/ippicv_win/iw
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)

  OpenCL:                        YES (NVD3D11)
    Include path:                D:/opencv/opencv-4.8.0/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python (for build):            NO

  Install to:                    D:/opencv/opencv-4.8.0/build/install
-----------------------------------------------------------------

forward time = 25.4416 ms

C:\Users\ZHANG\Desktop\test>
ZJDATY commented 10 months ago

@ZJDATY Can you try this patch:#23952? Maybe it fixs this issue.

@zihaomu Hi,I saw that the problem is that it slows down compared to 4.7, but I have also compiled 4.7.0, and the test results are not significantly different from 4.8.0, so I don't think it can solve my problem. The results I tested seem to indicate a correlation with multi core scheduling.

ZJDATY commented 10 months ago

Hi,@zihaomu @asmorkalov @fengyuentau @WanliZhong Do you have any solution to this problem now? I'm happy to assist with testing, If you need me to provide my remote desktop, please let me know.

zihaomu commented 10 months ago

Can you try this patch:https://github.com/opencv/opencv/pull/23952? Maybe it fixs this issue.

I still can not reproduce this issue, both for single thread and multi-thread on my machine.

ZJDATY commented 10 months ago

Can you try this patch:#23952? Maybe it fixs this issue.

I still can not reproduce this issue, both for single thread and multi-thread on my machine.

Hi ,@zihaomu @fengyuentau @WanliZhong I just compiled this patch. The test results are the same as before, and I tested the same effect on all three computers with different CPUs. Can you test the software I generated on your computer?

dnnspeedtest.cpp

#include <opencv2/opencv.hpp>

int main()
{
    cv::TickMeter *t = new cv::TickMeter();
    t->reset();
    auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
    net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
    net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
    cv::Mat frame(416, 416, CV_32FC3), blob;
    //cv::setNumThreads(6);
    cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);
    net.setInput(blob);
    double minT = 100000;
    for (int i = 0; i < 100; i++)
    {
        t->start();
        net.forward();
        t->stop();
        double timeN = t->getAvgTimeMilli();
        if (timeN < minT)
            minT = timeN;
        t->reset();
    }
    std::cout << cv::getVersionString() << std::endl;
    std::cout << cv::getBuildInformation() << std::endl;
    std::cout << "forward time = " << minT << " ms" << std::endl;
    delete t;
    return 0;
}

image

And the compiled opencv perf dnn.exe image

Please test it, I really don't understand why you can't test my results.

https://www.aliyundrive.com/s/aZ8XoLEfZcg

The result of my testing is that the time for opencv455 is 15-16ms, and the time for opencv480 is 25-26ms.

fengyuentau commented 9 months ago

Is it platform specific? Could you try the same tests on Ubuntu?

ZJDATY commented 9 months ago

Is it platform specific? Could you try the same tests on Ubuntu?

Hi,@fengyuentau Although I have a computer with Ubuntu installed, I still cannot write programs using Ubuntu. Can you help test the comparison results of these two programs under Windows? My current test result is that under multi-core, opencv4.8.0 will be slower than opencv4.5.5. Can you share two versions of DLL files compiled based on VS2019? I would like to check if there is a problem with the cmake compiler. 254472755-064018ac-c2b0-464b-a4ba-d4be69b238ef

ukoehler commented 9 months ago

I just stumbled across an issue that might be very similar or the reason. The same inference on 4.8.0 compared to 4.5.2 is sometimes slower. I traced it down to 4.5.2 using maximum 553.6 MB RAM wand 4.8.0 is using 1.89 GB RAM. Anybody seeing the problem might be swapping? Is the very high RAM usage a known issue?

ukoehler commented 9 months ago

I created a new issue with the requested information: https://github.com/opencv/opencv/issues/24134

ukoehler commented 9 months ago

I just ran more test and have increases from 1.368 s for version 4.5.2 to 4.351 s for version 4.8.0.

This version is just collecting show stopper bugs.

ZJDATY commented 9 months ago

Is it related to the reason why multithreaded inference in Opencv 4.8.0 takes longer than Opencv 4.5.5? I have been waiting for this problem to be resolved. https://github.com/opencv/opencv/issues/24134

ukoehler commented 9 months ago

@ZJDATY , I haven't observed that effect. I am comparing multi-threaded inference between 4.5.2 and 4.8.0. After a lengthy discussion here: https://github.com/opencv/opencv/issues/24134#issuecomment-1674667154, I suggest to try net.enableWinograd(false) before the inference. That recovered the 4.5.2 speed for me and some (not all) of the increased memory usage.

ZJDATY commented 9 months ago

Can you try this patch:#23952? Maybe it fixs this issue. I still can not reproduce this issue, both for single thread and multi-thread on my machine.

Hi ,@zihaomu @fengyuentau @WanliZhong I just compiled this patch. The test results are the same as before, and I tested the same effect on all three computers with different CPUs. Can you test the software I generated on your computer?

dnnspeedtest.cpp

#include <opencv2/opencv.hpp>

int main()
{
  cv::TickMeter *t = new cv::TickMeter();
  t->reset();
  auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
  net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
  net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
  cv::Mat frame(416, 416, CV_32FC3), blob;
  //cv::setNumThreads(6);
  cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);
  net.setInput(blob);
  double minT = 100000;
  for (int i = 0; i < 100; i++)
  {
      t->start();
      net.forward();
      t->stop();
      double timeN = t->getAvgTimeMilli();
      if (timeN < minT)
          minT = timeN;
      t->reset();
  }
  std::cout << cv::getVersionString() << std::endl;
  std::cout << cv::getBuildInformation() << std::endl;
  std::cout << "forward time = " << minT << " ms" << std::endl;
  delete t;
  return 0;
}

image

And the compiled opencv perf dnn.exe image

Please test it, I really don't understand why you can't test my results.

https://www.aliyundrive.com/s/aZ8XoLEfZcg

The result of my testing is that the time for opencv455 is 15-16ms, and the time for opencv480 is 25-26ms.

@ukoehler Can you help me test this program? The files I have compiled are all on the network drive, and you can also compile my test code yourself.