highgui: wayland: fix to pass highgui test

Kumataro commented 1 month ago

Close #25550

optimize Mat to XRGB8888 conversion with OpenCV functions
- extend to support CV_8S/16U/16S/32F/64F
- extend to support 1/4 channels
fix to update value timing
- initilize slider_ value if value is not nullptr.
- Update user-ptr value and call on_change() function if cv_wl_trackbar::draw() is not called.
Update usage of WAYLAND/XDG macro to avoid reference undefined macro.
Update documents

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

[x] I agree to contribute to the project under Apache 2 License.
[x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
[x] The PR is proposed to the proper branch
[x] There is a reference to the original bug report and related work
[x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.
[x] The feature is well documented and sample code can be built with the project CMake

Kumataro commented 1 month ago

This patch contains performance tuning.

Result

I compare count of Instruction references.

Environment	Ir of write_mat_to_xrgb8888()
OpenCV 4.9(Before)	5,911,406
Without SIMD	2,453,868
SSE3 + SIMD128	1.150.789
SSE4.1 + SIMD128	325,695
AVX2 + SIMD256	260,544

The differences between SSE3 and SSE4.1 comes from intristic implementation.

https://github.com/opencv/opencv/blob/dad8af6b17f8e60d7b95a1203a1b4d22f56574cf/modules/core/include/opencv2/core/hal/intrin_sse.hpp#L2422-L2453

Test

Source code is here.

// g++ main.cpp -o a.out -I /usr/local/include/opencv4 -lopencv_core -lopencv_highgui -lopencv_imgcodecs
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <iostream>
#include <string>

int main(int argc, char *argv[])
{
  std::cout << "cv::currentUIFramework() returns " << cv::currentUIFramework() << std::endl;

  cv::Mat src;
  src = cv::imread("opencv-logo.png");

  cv::namedWindow("src");
  cv::imshow("src", src);
  (void)cv::waitKey(1000);

  return 0;
}

Command is here.

valgrind --tool=callgrind ./a.out
callgrind_annotate callgrind.out.[PID] | grep 8888 | head -1

Kumataro commented 1 month ago

Proposed todo: Add RISC-V RVV and other scalable vector intrinsics support. Need to use CV_SIMD_SCALABLE macro and run-time value step in loops.

Thank you for your proposal ! I'll try it this weekend. I have no ARM SVE and RISC-V Vector Extension environment, but it looks like to able to test with AVX environment.

I think current implementation will be refactoring similar to split function. https://github.com/opencv/opencv/blob/4.x/modules/core/src/split.simd.hpp

For example(this is only my imagination ).

template<typename T, typename VecT> static void
vecwrite_T_to_xrgb8888( const T* src, T* dst, int len, int scn )
{
    const int VECSZ = VTraits<VecT>::vlanes();
    const int dcn = 4; // XRGB
:
:
    else if( scn == 3 )
    {
        for( i = 0; i < len; i += VECSZ )
        {
            if( i > len - VECSZ )
            {
                i = len - VECSZ;
                mode = hal::STORE_UNALIGNED;
            }
            VecT b,g,r;
            v_load_deinterleave(src + i*scn, b, g, r);
            v_store_interleave (dst + i*dcn, b, g, r, r, mode);
            if( i < i0 )
            {
                i = i0 - VECSZ;
                mode = hal::STORE_ALIGNED_NOCACHE;
            }
        }
    }

Kumataro commented 1 month ago

I update code to support CV_SIMD_SCALABLE and tested with VMWare(AVX2) and Raspi4(NEON) with ubuntu24.04. opencv_test_highgui is passed and it called vector implementation.

This logic is simple. I add AVX512_SKX LASX and RVV because I expected it to be effective.

ocv_add_dispatched_file(write_mat_to_xrgb8888 SSE4_1 AVX2 AVX512_SKX NEON LASX RVV)

Kumataro commented 1 month ago

I see cvtColor implementation, and I have second idea to use cvtColor(cv::BGR2BGRA) or cvtColor(cv::GRAY2BGRA) instead of this SIMD implementation. I'l try it.

Wayland requests [B8:G8:R8:X8], not [B8:G8:R8:A8]. I thought it is hard to extend cvtColor() to support RGBX for this backend only .

But I notice X channel is not used, it means there are no problem even if it stores non-transparency alpha value. So we can use COLOR_BGR2BGRA2 option for this purpose.

We can get many performance improvemet, which are provided from OpenCL, IPP, multithread, via cvtColor(). And furthermore, the maintainability of the code is also improved.

opencv / opencv