Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX

gititgo commented 2 years ago

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

[x] I agree to contribute to the project under Apache 2 License.
[x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
[x] The PR is proposed to the proper branch
[ ] There is a reference to the original bug report and related work
[x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.
[ ] The feature is well documented and sample code can be built with the project CMake

asmorkalov commented 2 years ago

Could you point to cross-complier and may be some notes how to build the code. QEMU or some other emulator will be very useful too. Also I recommend you to take a look on cvRound implementation. It's used everywhere and efficient rounding affects performance a lot: https://github.com/opencv/opencv/blob/9aa647068b2eba4a34462927b1878353dfd3df69/modules/core/include/opencv2/core/fast_math.hpp#L200

gititgo commented 2 years ago

Could you point to cross-complier and may be some notes how to build the code. QEMU or some other emulator will be very useful too.

We are preparing such QEMU， but it is not finished yet. We'll provide it as soon as it's available.

gititgo commented 2 years ago

The cross-complier (build Loongarch on x86) and cmake config file is here: git clone https://gitee.com/wenux/cross-compiler-la-on-x86.git

How to use it: (1)tar -xvf toolchain-loongarch64-linux-gnu-cross-830-rc1.0-2022-04-22a.tar.xz (2)Set "tools" in cmake config file (la64_linux_setup.cmake) to your real path; (3)cmake with config file: cmake -DCMAKE_TOOLCHAIN_FILE=path/to/your/la64_linux_setup.cmake -DCPU_BASELINE=LASX -DBUILD_OPENJPEG=ON ../ (4)make

gititgo commented 2 years ago

Is QEMU necessary for this PR ?

asmorkalov commented 2 years ago

It'll be great to have QEMU to run tests.

gititgo commented 1 year ago

It'll be great to have QEMU to run tests.

Is a 3A5000（Loongarch64） environment ok？ Because QEMU may take a long time.

fengyuentau commented 1 year ago

How long is it going to take to run all unit tests in QEMU? It is recommanded to have a CI pipeline to test code automatically. Please at least provide something that we can perform tests.

gititgo commented 1 year ago

How long is it going to take to run all unit tests in QEMU? It is recommanded to have a CI pipeline to test code automatically. Please at least provide something that we can perform tests.

QEMU is under development, but I haven't got the exact time. We can provide a remote Loongarch environment. Can this be used for automated testing ?

fengyuentau commented 1 year ago

Yes, but we are also expecting an environment which we can use for testing if your remote Loongarch environment is expired to us. So it is recommanded to have QEMU to run tests on Loongarch.

asmorkalov commented 1 year ago

The provided git link (git clone https://gitee.com/wenux/cross-compiler-la-on-x86.git) is protected by username and password.

gititgo commented 1 year ago

The provided git link (git clone https://gitee.com/wenux/cross-compiler-la-on-x86.git) is protected by username and password.

Sorry，it‘s ok now.

asmorkalov commented 1 year ago

opencv/modules/core/src/parallel_impl.cpp:63:5: warning: #warning "Can't detect 'pause' (CPU-yield) instruction on the target platform. Specify CV_PAUSE() definition via compiler flags." [-Wcpp]
 #   warning "Can't detect 'pause' (CPU-yield) instruction on the target platform. Specify CV_PAUSE() definition via compiler flags."

asmorkalov commented 1 year ago

Also the best way to enable OpenCV cross-compilation for the new architecture is to place toolchain file to opencv/platforms/linux/loongson. I propose to replace cmake local variable tools with CACHE variable with meaningful name to set it from command line. See opencv/platforms/linux/riscv64-clang.toolchain.cmake as example.

asmorkalov commented 1 year ago

@gititgo Thanks a lot for the toolchain. I was able to build the source code.

vpisarev commented 1 year ago

@gititgo, thank you for the contribution! Please, mark the proper items in the checklist:

  * I agree to contribute to the project under Apache 2 License.
  * To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV

without this confirmation we cannot merge your code into OpenCV

gititgo commented 1 year ago

without this confirmation we cannot merge your code into OpenCV

OK，marked

gititgo commented 1 year ago

@fengyuentau We prepared a remote LoongArch PC for test. The IP and password have been emailed to you.

fengyuentau commented 1 year ago

I run accuracy tests (./bin/opencvtest*) on your LoongArch PC and only 5 modules (core, flann, highgui, ml and videoio) did not fail at Segmentation fault. The CMake command I used is cmake -B build -D CPU_BASELINE=LASX opencv. Did I miss something?

gititgo commented 1 year ago

Please -DBUILD_PNG=ON as there may be a bug in libpng in our system.

fengyuentau commented 1 year ago

Rebuilt with CMake option -DBUILD_PNG=ON and no more segmentation faults. But Calib3d_StereoBM.regression failed as discussed above.

gititgo commented 1 year ago

Yes，there are two known issues： 1、Calib3d_StereoBM.regression： may be the compiler optimization issue as DEBUG version is OK. 2、videoio/videocapture_acceleration.read（ffmpeg）： ffmpeg has a bug in our system （optimized on loongarch） as it‘s OK when we use Open source ffmpeg.

fengyuentau commented 1 year ago

Yes，there are two known issues： 1、Calib3d_StereoBM.regression： may be the compiler optimization issue as DEBUG version is OK. 2、videoio/videocapture_acceleration.read（ffmpeg）： ffmpeg has a bug in our system （optimized on loongarch） as it‘s OK when we use Open source ffmpeg.

Tested again, those known issues are still there:

# calib
[  FAILED  ] Calib3d_StereoBM.regression
# videoio
[  FAILED  ] videoio_ffmpeg.parallel
[  FAILED  ] videoio/videocapture_acceleration.read/32, where GetParam() = (sample_322x242_15frames.yuv420p.mpeg2video.mp4, FFMPEG, NONE, false)
[  FAILED  ] videoio/videocapture_acceleration.read/33, where GetParam() = (sample_322x242_15frames.yuv420p.mpeg2video.mp4, FFMPEG, NONE, true)
[  FAILED  ] videoio/videocapture_acceleration.read/34, where GetParam() = (sample_322x242_15frames.yuv420p.mpeg2video.mp4, FFMPEG, ANY, false)
[  FAILED  ] videoio/videocapture_acceleration.read/35, where GetParam() = (sample_322x242_15frames.yuv420p.mpeg2video.mp4, FFMPEG, ANY, true)
[  FAILED  ] videoio/videocapture_acceleration.read/64, where GetParam() = (sample_322x242_15frames.yuv420p.libx265.mp4, FFMPEG, NONE, false)
[  FAILED  ] videoio/videocapture_acceleration.read/65, where GetParam() = (sample_322x242_15frames.yuv420p.libx265.mp4, FFMPEG, NONE, true)
[  FAILED  ] videoio/videocapture_acceleration.read/66, where GetParam() = (sample_322x242_15frames.yuv420p.libx265.mp4, FFMPEG, ANY, false)
[  FAILED  ] videoio/videocapture_acceleration.read/67, where GetParam() = (sample_322x242_15frames.yuv420p.libx265.mp4, FFMPEG, ANY, true)
[  FAILED  ] videoio/videocapture_acceleration.read/80, where GetParam() = (sample_322x242_15frames.yuv420p.libvpx-vp9.mp4, FFMPEG, NONE, false)
[  FAILED  ] videoio/videocapture_acceleration.read/81, where GetParam() = (sample_322x242_15frames.yuv420p.libvpx-vp9.mp4, FFMPEG, NONE, true)
[  FAILED  ] videoio/videocapture_acceleration.read/82, where GetParam() = (sample_322x242_15frames.yuv420p.libvpx-vp9.mp4, FFMPEG, ANY, false)
[  FAILED  ] videoio/videocapture_acceleration.read/83, where GetParam() = (sample_322x242_15frames.yuv420p.libvpx-vp9.mp4, FFMPEG, ANY, true)
[  FAILED  ] videoio/videocapture_acceleration.read/96, where GetParam() = (sample_322x242_15frames.yuv420p.libaom-av1.mp4, FFMPEG, NONE, false)
[  FAILED  ] videoio/videocapture_acceleration.read/97, where GetParam() = (sample_322x242_15frames.yuv420p.libaom-av1.mp4, FFMPEG, NONE, true)
[  FAILED  ] videoio/videocapture_acceleration.read/98, where GetParam() = (sample_322x242_15frames.yuv420p.libaom-av1.mp4, FFMPEG, ANY, false)
[  FAILED  ] videoio/videocapture_acceleration.read/99, where GetParam() = (sample_322x242_15frames.yuv420p.libaom-av1.mp4, FFMPEG, ANY, true)

Tests on other modules passed.

gititgo commented 1 year ago

Tested again, those known issues are still there

The version of ffmpeg has just been updated on the remote env, videoio module passed now.

asmorkalov commented 1 year ago

Thanks a lot for update! Please take a look on "docs" builder. It reports a lot of formatting issues like "modules/core/include/opencv2/core/hal/intrin_lasx.hpp:1865: trailing whitespace."

gititgo commented 1 year ago

OK， formatting issues are fixed.

fengyuentau commented 1 year ago

issue 1: core

The following test fails from time to time:

# core
[  FAILED  ] Core/HAL.mat_decomp/15, where GetParam() = 15

It fails most likely in the first run of a fresh complilation, and passes from the second run.

issue 2: highgui & gtk

And with the new ffmpeg, issues on videoio module were gone. However, it seems gtk is somehow misconfigured. At first, gtk was missing, but after installing gtk, the issue became:

[----------] 3 tests from Highgui_GUI
[ RUN      ] Highgui_GUI.regression
Exception message: OpenCV(4.6.0-dev) /home/loongson/opencv-workspace/opencv-21833/modules/highgui/src/window_gtk.cpp:635: error: (-2:Unspecified error) Can't initialize GTK backend in function 'cvInitSystem'

/home/loongson/opencv-workspace/opencv-21833/modules/highgui/test/test_gui.cpp:72: Failure
Expected: namedWindow(window_name) doesn't throw an exception.
  Actual: it throws.
[  FAILED  ] Highgui_GUI.regression (2 ms)
[ RUN      ] Highgui_GUI.trackbar_unsafe
Exception message: OpenCV(4.6.0-dev) /home/loongson/opencv-workspace/opencv-21833/modules/highgui/src/window_gtk.cpp:652: error: (-2:Unspecified error) GTK backend is not available in function 'cvInitSystem'

/home/loongson/opencv-workspace/opencv-21833/modules/highgui/test/test_gui.cpp:153: Failure
Expected: namedWindow(window_name) doesn't throw an exception.
  Actual: it throws.
[  FAILED  ] Highgui_GUI.trackbar_unsafe (0 ms)
[ RUN      ] Highgui_GUI.trackbar
Exception message: OpenCV(4.6.0-dev) /home/loongson/opencv-workspace/opencv-21833/modules/highgui/src/window_gtk.cpp:652: error: (-2:Unspecified error) GTK backend is not available in function 'cvInitSystem'

/home/loongson/opencv-workspace/opencv-21833/modules/highgui/test/test_gui.cpp:192: Failure
Expected: namedWindow(window_name) doesn't throw an exception.
  Actual: it throws.
[  FAILED  ] Highgui_GUI.trackbar (0 ms)
[----------] 3 tests from Highgui_GUI (2 ms total)

I guess this is another software compatibility issue.

asmorkalov commented 1 year ago

[ FAILED ] Core/HAL.mat_decomp/15, where GetParam() = 15 - it's compute test. Most probably it's a sign of UB somewhere in code or compiler issue.

asmorkalov commented 1 year ago

[ RUN ] Highgui_GUI.regression - please check if X session is available and properly configured. Otherwise you need to build OpenCV without UI support.

gititgo commented 1 year ago

[----------] 3 tests from Highgui_GUI

By using the following command to test on remote env, you can make the graphics display locally： $ export DISPLAY=:0.0; ./bin/opencv_test_highgui

fengyuentau commented 1 year ago

[ FAILED ] Core/HAL.mat_decomp/15, where GetParam() = 15 - it's compute test. Most probably it's a sign of UB somewhere in code or compiler issue.

@asmorkalov Do you mean undefined behaviour by UB?

[ RUN ] Highgui_GUI.regression - please check if X session is available and properly configured. Otherwise you need to build OpenCV without UI support.

With the env set by export DISPLAY=:0.0, highgui tests are now passed.

By using the following command to test on remote env, you can make the graphics display locally： $ export DISPLAY=:0.0; ./bin/opencv_test_highgui

Thanks!

asmorkalov commented 1 year ago

UB is undefined behavior, bug that triggers randomly depending on not initialized variable or stack content in case of out of bound access.

gititgo commented 1 year ago

core

[ FAILED ] Core/HAL.mat_decomp/15, where GetParam() = 15

This is a accuracy issue： /home/loongson/src/wenxue/opencv/modules/core/test/test_hal_core.cpp:205: Failure Expected: (cvtest::norm(x, x0, NORM_INF | NORM_RELATIVE)) <= (eps), actual: 1.08289e-10 vs 1e-10

The reason is that the compiler uses fmadd class instructions to optimize the "multiply + add" operation, but precision loss may be triggered in parallel. We disable this feature on loongarch paltform by using the compiler option "-ffp-contract = off" and then the test case passes.

Now we add the compiler option "-ffp-contract = off" in opencv/modules/core/CMakeLists.txt. Is there a better place to add this option ?

asmorkalov commented 1 year ago

fmadd is important optimization. IMHO we can tune test threshold a bit, but not disable it. @alalek @vpisarev what do you think?

fengyuentau commented 1 year ago

@gititgo Any updates regarding the issue on the core module? Can we simply adjust the threshold specifically for LASX?

gititgo commented 1 year ago

yes, tunning the test threshold is a good idea. If you have no objection, I will do this specifically for LASX.

vpisarev commented 1 year ago

@fengyuentau, @gititgo, I'm fine with increasing tolerance threshold from 1e-10 to 5e-10, for example

alalek commented 1 year ago

This merge has been done without necessary references in commits message on this PR.

Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX

    * Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX
    * Add resize.lasx.cpp for Loongson SIMD acceleration
    * Add imgwarp.lasx.cpp for Loongson SIMD acceleration
    * Add LASX acceleration support for dnn/conv
    * Add CV_PAUSE(v) for Loongarch
    * Set LASX by default on Loongarch64
    * LoongArch: tune test threshold for Core/HAL.mat_decomp/15

    Co-authored-by: shengwenxue <shengwenxue@loongson.cn>

No PR ID at all.

Existed maintenance scripts would not care about this merge.

opencv / opencv