opencv / opencv

Open Source Computer Vision Library
https://opencv.org
Apache License 2.0
76.67k stars 55.65k forks source link

highgui: wayland: fix to pass highgui test #25551

Closed Kumataro closed 1 month ago

Kumataro commented 1 month ago

Close #25550

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

Kumataro commented 1 month ago

This patch contains performance tuning.

Result

I compare count of Instruction references.

Environment Ir of write_mat_to_xrgb8888()
OpenCV 4.9(Before) 5,911,406
Without SIMD 2,453,868
SSE3 + SIMD128 1.150.789
SSE4.1 + SIMD128 325,695
AVX2 + SIMD256 260,544

The differences between SSE3 and SSE4.1 comes from intristic implementation.

https://github.com/opencv/opencv/blob/dad8af6b17f8e60d7b95a1203a1b4d22f56574cf/modules/core/include/opencv2/core/hal/intrin_sse.hpp#L2422-L2453

Test

Source code is here.

// g++ main.cpp -o a.out -I /usr/local/include/opencv4 -lopencv_core -lopencv_highgui -lopencv_imgcodecs
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <iostream>
#include <string>

int main(int argc, char *argv[])
{
  std::cout << "cv::currentUIFramework() returns " << cv::currentUIFramework() << std::endl;

  cv::Mat src;
  src = cv::imread("opencv-logo.png");

  cv::namedWindow("src");
  cv::imshow("src", src);
  (void)cv::waitKey(1000);

  return 0;
}

Command is here.

valgrind --tool=callgrind ./a.out
callgrind_annotate callgrind.out.[PID] | grep 8888 | head -1
Kumataro commented 1 month ago

Proposed todo: Add RISC-V RVV and other scalable vector intrinsics support. Need to use CV_SIMD_SCALABLE macro and run-time value step in loops.

Thank you for your proposal ! I'll try it this weekend. I have no ARM SVE and RISC-V Vector Extension environment, but it looks like to able to test with AVX environment.

I think current implementation will be refactoring similar to split function. https://github.com/opencv/opencv/blob/4.x/modules/core/src/split.simd.hpp

For example(this is only my imagination ).

template<typename T, typename VecT> static void
vecwrite_T_to_xrgb8888( const T* src, T* dst, int len, int scn )
{
    const int VECSZ = VTraits<VecT>::vlanes();
    const int dcn = 4; // XRGB
:
:
    else if( scn == 3 )
    {
        for( i = 0; i < len; i += VECSZ )
        {
            if( i > len - VECSZ )
            {
                i = len - VECSZ;
                mode = hal::STORE_UNALIGNED;
            }
            VecT b,g,r;
            v_load_deinterleave(src + i*scn, b, g, r);
            v_store_interleave (dst + i*dcn, b, g, r, r, mode);
            if( i < i0 )
            {
                i = i0 - VECSZ;
                mode = hal::STORE_ALIGNED_NOCACHE;
            }
        }
    }
Kumataro commented 1 month ago

I update code to support CV_SIMD_SCALABLE and tested with VMWare(AVX2) and Raspi4(NEON) with ubuntu24.04. opencv_test_highgui is passed and it called vector implementation.

This logic is simple. I add AVX512_SKX LASX and RVV because I expected it to be effective.

ocv_add_dispatched_file(write_mat_to_xrgb8888 SSE4_1 AVX2 AVX512_SKX NEON LASX RVV)

Kumataro commented 1 month ago

I see cvtColor implementation, and I have second idea to use cvtColor(cv::BGR2BGRA) or cvtColor(cv::GRAY2BGRA) instead of this SIMD implementation. I'l try it.

Wayland requests [B8:G8:R8:X8], not [B8:G8:R8:A8]. I thought it is hard to extend cvtColor() to support RGBX for this backend only .

But I notice X channel is not used, it means there are no problem even if it stores non-transparency alpha value. So we can use COLOR_BGR2BGRA2 option for this purpose.

We can get many performance improvemet, which are provided from OpenCL, IPP, multithread, via cvtColor(). And furthermore, the maintainability of the code is also improved.