opencv / opencv

Open Source Computer Vision Library
https://opencv.org
Apache License 2.0
76.24k stars 55.63k forks source link

OpenCV4.8.0 DNN inference speed reduced by 50%. #23911

Open ZJDATY opened 11 months ago

ZJDATY commented 11 months ago

System Information

OpenCV version: 4.8.0 Operating System / Platform: win10 Compiler & compiler version: vs2019

Compare versions: OpenCV version:OpenCV 4.5.5-openvino and the same envirment

Detailed description

Display compilation information and statistical time for both versions. The time consumption has increased from 20ms to 30ms.

General configuration for OpenCV 4.5.5-openvino =====================================
  Version control:               c3d60a6cac5b5b3c452d766494d21b005221efe0

  Platform:
    Timestamp:                   2022-03-11T02:31:28Z
    Host:                        Windows 10.0.19044 AMD64
    CMake:                       3.14.5
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1929
    Configuration:               Debug Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3 SSSE3 SSE4_1 POPCNT SSE4_2
      requested:                 SSE4_2
    Dispatched code generation:  FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      FP16 (1 files):            + FP16 AVX
      AVX (5 files):             + AVX
      AVX2 (33 files):           + FP16 FMA3 AVX AVX2
      AVX512_SKX (8 files):      + FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30037/bin/Hostx64/x64/cl.exe  (ver 19.29.30040.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /GS /sdl /guard:cf /w34018 /w34146 /w34244 /w34267 /w34302 /w34308 /w34509 /w34532 /w34533 /w34700 /w34789 /w34995 /w34996 /MP  /MD /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /GS /sdl /guard:cf /w34018 /w34146 /w34244 /w34267 /w34302 /w34308 /w34509 /w34532 /w34533 /w34700 /w34789 /w34995 /w34996 /MP  /MDd /Zi /Ob0 /Od /RTC1
    C Compiler:                  C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30037/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /GS /sdl /guard:cf /w34018 /w34146 /w34244 /w34267 /w34302 /w34308 /w34509 /w34532 /w34533 /w34700 /w34789 /w34995 /w34996 /MP   /MD /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise         /GS /sdl /guard:cf /w34018 /w34146 /w34244 /w34267 /w34302 /w34308 /w34509 /w34532 /w34533 /w34700 /w34789 /w34995 /w34996 /MP /MDd /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64   /guard:cf /dynamicbase /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64   /guard:cf /dynamicbase /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo python3 stitching ts video videoio
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 java python2
    Applications:                tests perf_tests apps
    Documentation:               NO
    Non-free algorithms:         NO

  Windows RT support:            NO

  GUI:                           WIN32UI
    Win32 UI:                    YES

  Media I/O:
    ZLib:                        build (ver 1.2.11)
    JPEG:                        build-libjpeg-turbo (ver 2.1.2-62)
    PNG:                         build (ver 1.6.37)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES
    Intel Media SDK:             YES (VPL::dispatcher VPL::api)

  Parallel framework:            Concurrency

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2020.0.0 Gold [2020.0.0]
           at:                   C:/jenkins/workspace/windows9f26d2f1/build_release/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2020.0.0)
              at:                C:/jenkins/workspace/windows9f26d2f1/build_release/3rdparty/ippicv/ippicv_win/iw
    Inference Engine:            YES (2022010000 / 2022.1.0)
        * libs:                  C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/lib/intel64/Release/openvino.lib / C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/lib/intel64/Debug/openvinod.lib C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Release/openvino.dll / C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Debug/openvinod.dll C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Release/openvino.dll C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Debug/openvinod.dll
        * includes:              C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/include C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/include/ie
    nGraph:                      YES (2022.1.0)
        * libs:                  C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/lib/intel64/Release/openvino.lib / C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/lib/intel64/Debug/openvinod.lib C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Release/openvino.dll / C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Debug/openvinod.dll C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Release/openvino.dll C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/bin/intel64/Debug/openvinod.dll
        * includes:              C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/include C:/jenkins/workspace/windows9f26d2f1/deployment_tools/runtime/include/ie
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)

  OpenCL:                        YES (NVD3D11)
    Include path:                C:/jenkins/workspace/windows9f26d2f1/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 3:
    Interpreter:                 C:/miniconda/envs/py3_env/python.exe (ver 3.4.5)
    Libraries:                   C:/miniconda/envs/py3_env/libs/python34.lib (ver 3.4.5)
    numpy:                       C:/miniconda/envs/py3_env/lib/site-packages/numpy/core/include (ver 1.11.3)
    install path:                python/cv2/python-3

  Python (for build):            C:/miniconda/envs/py3_env/python.exe

  Install to:                    C:/jenkins/workspace/windows9f26d2f1/build_release/install
-----------------------------------------------------------------

[yolov4]
        init >> 39.592ms
        inference >> min = 21.601ms, max = 32.751ms, mean = 21.846ms, stddev = 1.35271ms

D:\vcworkspaces\yolov4_tiny_dnn_demo\x64\Release\yolov4_tiny_dnn_demo.exe (进程 7704)已退出,代码为 0。
按任意键关闭此窗口. . .

opencv4.8.0

General configuration for OpenCV 4.8.0 =====================================
  Version control:               4.8.0

  Platform:
    Timestamp:                   2023-06-28T12:35:18Z
    Host:                        Windows 10.0.19045 AMD64
    CMake:                       3.23.3
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1928
    Configuration:               Debug Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (16 files):         + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (7 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (35 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (5 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe  (ver 19.28.29334.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /MD /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /MDd /Zi /Ob0 /Od /RTC1
    C Compiler:                  C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP   /MD /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP /MDd /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo stitching video videoio world
    Disabled:                    python3
    Disabled by dependency:      -
    Unavailable:                 java python2 ts
    Applications:                apps
    Documentation:               NO
    Non-free algorithms:         NO

  Windows RT support:            NO

  GUI:
    Win32 UI:                    YES
    VTK support:                 NO

  Media I/O:
    ZLib:                        build (ver 1.2.13)
    JPEG:                        build-libjpeg-turbo (ver 2.1.3-62)
      SIMD Support Request:      YES
      SIMD Support:              NO
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         build (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build (ver 2.5.0)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    DC1394:                      NO
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    GStreamer:                   NO
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES

  Parallel framework:            Concurrency

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2021.8 [2021.8.0]
           at:                   C:/GHA-OCV-1/_work/ci-gha-workflow/ci-gha-workflow/build/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2021.8.0)
              at:                C:/GHA-OCV-1/_work/ci-gha-workflow/ci-gha-workflow/build/3rdparty/ippicv/ippicv_win/iw
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)
    Flatbuffers:                 builtin/3rdparty (23.5.9)

  OpenCL:                        YES (NVD3D11)
    Include path:                C:/GHA-OCV-1/_work/ci-gha-workflow/ci-gha-workflow/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python (for build):            C:/Python-3.9/python.exe

  Java:
    ant:                         C:/apache-ant-1.9.15/bin/ant.bat (ver 1.9.15)
    Java:                        NO
    JNI:                         C:/Program Files/Java/jdk-11.0.9/include C:/Program Files/Java/jdk-11.0.9/include/win32 C:/Program Files/Java/jdk-11.0.9/include
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    C:/GHA-OCV-1/_work/ci-gha-workflow/ci-gha-workflow/install
-----------------------------------------------------------------

[yolov4]
        init >> 92.686ms
        inference >> min = 29.27ms, max = 38.598ms, mean = 32.5029ms, stddev = 1.07574ms

D:\vcworkspaces\yolov4_tiny_dnn_demo\x64\Release\yolov4_tiny_dnn_demo.exe (进程 11968)已退出,代码为 0。
按任意键关闭此窗口. . .

Steps to reproduce

#include <iostream>
#include <queue>
#include <iterator>
#include <sstream>
#include <fstream>
#include <iomanip>
#include <chrono>

#include <opencv2/core.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/dnn/all_layers.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <numeric>

constexpr float CONFIDENCE_THRESHOLD = 0;
constexpr float NMS_THRESHOLD = 0.4;
constexpr int NUM_CLASSES = 80;

// colors for bounding boxes
const cv::Scalar colors[] = {
    {0, 255, 255},
    {255, 255, 0},
    {0, 255, 0},
    {255, 0, 0}
};
const auto NUM_COLORS = sizeof(colors) / sizeof(colors[0]);

int main()
{
    std::cout << cv::getBuildInformation() << std::endl;
    std::vector<std::string> class_names;
    {
        std::ifstream class_file("yolo/coco.names");
        if (!class_file)
        {
            std::cerr << "failed to open classes.txt\n";
            return 0;
        }

        std::string line;
        while (std::getline(class_file, line))
            class_names.push_back(line);
    }
    std::string b = "./yolo/yolo_test.mp4";
    cv::VideoCapture source(b);

    auto net = cv::dnn::readNetFromDarknet("yolo/yolov4-tiny.cfg", "yolo/yolov4-tiny.weights");
    net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
    net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
    auto output_names = net.getUnconnectedOutLayersNames();

    cv::Mat frame(416, 416, CV_32FC3), blob; //Tiny
    std::vector<cv::Mat> detections;
    std::vector<float> runtimes;
    auto init_start = std::chrono::steady_clock::now();
    cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F); //Tiny
    net.setInput(blob);
    net.forward(detections, output_names);
    auto init_end = std::chrono::steady_clock::now();
    while (cv::waitKey(1) < 1)
    {
        source >> frame;
        if (frame.empty())
            break;

        auto total_start = std::chrono::steady_clock::now();
        cv::dnn::blobFromImage(frame, blob, 1.0 / 255, cv::Size(416, 416), cv::Scalar(), true, false, CV_32F);  //Tiny
        net.setInput(blob);

        auto dnn_start = std::chrono::steady_clock::now();
        net.forward(detections, output_names);
        auto dnn_end = std::chrono::steady_clock::now();

        std::vector<int> indices[NUM_CLASSES];
        std::vector<cv::Rect> boxes[NUM_CLASSES];
        std::vector<float> scores[NUM_CLASSES];

        for (auto& output : detections)
        {
            const auto num_boxes = output.rows;
            for (int i = 0; i < num_boxes; i++)
            {
                auto x = output.at<float>(i, 0) * frame.cols;
                auto y = output.at<float>(i, 1) * frame.rows;
                auto width = output.at<float>(i, 2) * frame.cols;
                auto height = output.at<float>(i, 3) * frame.rows;
                cv::Rect rect(x - width / 2, y - height / 2, width, height);

                for (int c = 0; c < NUM_CLASSES; c++)
                {
                    auto confidence = *output.ptr<float>(i, 5 + c);
                    if (confidence >= CONFIDENCE_THRESHOLD)
                    {
                        boxes[c].push_back(rect);
                        scores[c].push_back(confidence);
                    }
                }
            }
        }

        for (int c = 0; c < NUM_CLASSES; c++)
            cv::dnn::NMSBoxes(boxes[c], scores[c], 0.0, NMS_THRESHOLD, indices[c]);

        for (int c = 0; c < NUM_CLASSES; c++)
        {
            for (size_t i = 0; i < indices[c].size(); ++i)
            {
                const auto color = colors[c % NUM_COLORS];

                auto idx = indices[c][i];
                const auto& rect = boxes[c][idx];
                cv::rectangle(frame, cv::Point(rect.x, rect.y), cv::Point(rect.x + rect.width, rect.y + rect.height), color, 3);

                std::ostringstream label_ss;
                label_ss << class_names[c] << ": " << std::fixed << std::setprecision(2) << scores[c][idx];
                auto label = label_ss.str();

                int baseline;
                auto label_bg_sz = cv::getTextSize(label.c_str(), cv::FONT_HERSHEY_COMPLEX_SMALL, 1, 1, &baseline);
                cv::rectangle(frame, cv::Point(rect.x, rect.y - label_bg_sz.height - baseline - 10), cv::Point(rect.x + label_bg_sz.width, rect.y), color, cv::FILLED);
                cv::putText(frame, label.c_str(), cv::Point(rect.x, rect.y - baseline - 5), cv::FONT_HERSHEY_COMPLEX_SMALL, 1, cv::Scalar(0, 0, 0));
            }
        }

        auto total_end = std::chrono::steady_clock::now();

        float inference_fps = std::chrono::duration_cast<std::chrono::microseconds>(dnn_end - dnn_start).count() / 1000.0;
        //std::cout << "模型推理时间为:" << inference_fps << " ms" << std::endl;
        float total_fps = std::chrono::duration_cast<std::chrono::microseconds>(total_end - total_start).count() / 1000.0;
        //std::cout << "单帧总耗费时间为:" << total_fps << " ms" << std::endl;
        std::ostringstream stats_ss;
        stats_ss << std::fixed << std::setprecision(2);
        stats_ss << "Inference FPS: " << 1000.0 / inference_fps << ", Total FPS: " << 1000.0 / total_fps;
        runtimes.push_back(total_fps);
        auto stats = stats_ss.str();
        int baseline;
        auto stats_bg_sz = cv::getTextSize(stats.c_str(), cv::FONT_HERSHEY_COMPLEX_SMALL, 1, 1, &baseline);
        cv::rectangle(frame, cv::Point(0, 0), cv::Point(stats_bg_sz.width, stats_bg_sz.height + 10), cv::Scalar(0, 0, 0), cv::FILLED);
        cv::putText(frame, stats.c_str(), cv::Point(0, stats_bg_sz.height + 5), cv::FONT_HERSHEY_COMPLEX_SMALL, 1, cv::Scalar(255, 255, 255));
        //cv::namedWindow("output", cv::WindowFlags::WINDOW_AUTOSIZE);
        //cv::imshow("output", frame);
    }
    auto sum = std::accumulate(std::begin(runtimes), std::end(runtimes), 0.0f);
    auto squared_sum = std::inner_product(std::begin(runtimes), std::end(runtimes), std::begin(runtimes), 0.0f);

    auto min = *std::min_element(std::begin(runtimes), std::end(runtimes));
    auto max = *std::max_element(std::begin(runtimes), std::end(runtimes));
    auto mean = sum / runtimes.size();
    auto stddev = std::sqrt(squared_sum / runtimes.size() - mean * mean);

    std::cout << '[' << "yolov4-tiny" << "]" << '\n'
        << "\tinit >> " << std::chrono::duration_cast<std::chrono::microseconds>(init_end - init_start).count() / 1000.0 << "ms" << '\n'
        << "\tinference >> " << "min = " << min << "ms, max = " << max << "ms, mean = " << mean << "ms, stddev = " << stddev << "ms" << std::endl;

    //cv::destroyAllWindows();
    return 0;
}

Related models and video downloads.

Download: https://pan.baidu.com/s/1wpXBbdtJMUrYULAelglIiw?pwd=x67k

Issue submission checklist

ukoehler commented 9 months ago

@ZJDATY Sorry, not easily, since the computer is not connected to the internet. However, all you have to do is add net.enableWinograd(false); after net.setInput(blob); for version 4.8.0

ZJDATY commented 9 months ago

@ZJDATY Sorry, not easily, since the computer is not connected to the internet. However, all you have to do is add net.enableWinograd(false); after net.setInput(blob); for version 4.8.0

Thank you. I tested it and the results are the same as before. Opencv480 will slow down by about 10ms.

ukoehler commented 9 months ago

Well, it was worth a try.

ZJDATY commented 5 months ago

I am excited to see that version 4.9 has been released, but unfortunately, the optimization issue has not been resolved in version 4.9 yet.

ZJDATY commented 5 months ago

I am excited to see that version 4.9 has been released, but unfortunately, the optimization issue has not been resolved in version 4.9 yet.

My computer's CPU is i7-10700 now, and the inference time using Opencv455 version is 13ms. The inference time for using Opencv490 version is 23ms.