Junyan721113 commented 2 months ago

Summary

Previous context

From PR #24556:

As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the Carotene library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: see https://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp Reference documentation is here:

https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html

https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html

https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html

Carotene library is turned on here: https://github.com/opencv/opencv/blob/8bbf08f0de9c387c12afefdb05af7780d989e4c3/CMakeLists.txt#L906-L911

As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well. All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations. This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon.

Progress

Part 1 (This PR)

Core
[x] Element-wise add and subtract
[x] Element-wise minimum or maximum
[x] Element-wise absolute difference
[x] Bitwise logical operations
[x] Element-wise compare
ImgProc
[x] Integral
[x] Threshold
[x] WarpAffine
[x] WarpPerspective
Features2D

Part 2 (Next PR)

Rough Estimate. Todo List May Change.

Core
ImgProc
AdaptiveThreshold
BoxFilter
Canny
Convert
Filter
GaussianBlur
MedianBlur
Morph
Pyrdown
Resize
Scharr
SepFilter
Sobel
Features2D
FAST

Performance Tests

The optimization does not contain floating point opreations.

Absolute Difference

Geometric mean (ms)

Name of Test	opencv perf core Absdiff	opencv perf core Absdiff	opencv perf core Absdiff vs opencv perf core Absdiff (x-factor)
Absdiff::OCL_AbsDiffFixture::(640x480, 8UC1)	23.104	5.972	3.87
Absdiff::OCL_AbsDiffFixture::(640x480, 32FC1)	39.500	40.830	0.97
Absdiff::OCL_AbsDiffFixture::(640x480, 8UC3)	69.155	15.051	4.59
Absdiff::OCL_AbsDiffFixture::(640x480, 32FC3)	118.715	120.509	0.99
Absdiff::OCL_AbsDiffFixture::(640x480, 8UC4)	93.001	19.770	4.70
Absdiff::OCL_AbsDiffFixture::(640x480, 32FC4)	161.136	160.791	1.00
Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC1)	69.211	15.140	4.57
Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC1)	118.762	119.263	1.00
Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC3)	212.414	44.692	4.75
Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC3)	367.512	366.569	1.00
Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC4)	285.337	59.708	4.78
Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC4)	490.395	491.118	1.00
Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC1)	158.827	33.462	4.75
Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC1)	273.503	273.668	1.00
Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC3)	484.175	100.520	4.82
Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC3)	828.758	829.689	1.00
Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC4)	648.592	137.195	4.73
Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC4)	1116.755	1109.587	1.01
Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC1)	648.715	134.875	4.81
Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC1)	1115.939	1113.818	1.00
Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC3)	1944.791	413.420	4.70
Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC3)	3354.193	3324.672	1.01
Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC4)	2594.585	553.486	4.69
Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC4)	4473.543	4438.453	1.01

Bitwise Operation

Geometric mean (ms)

Name of Test	opencv perf core Bit	opencv perf core Bit	opencv perf core Bit vs opencv perf core Bit (x-factor)
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC1)	22.542	4.971	4.53
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC1)	90.210	19.917	4.53
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC3)	68.429	15.037	4.55
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC3)	280.168	59.239	4.73
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC4)	90.565	19.735	4.59
Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC4)	374.695	79.257	4.73
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC1)	67.824	14.873	4.56
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC1)	279.514	59.232	4.72
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC3)	208.337	44.234	4.71
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC3)	851.211	182.522	4.66
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC4)	279.529	59.095	4.73
Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC4)	1132.065	244.877	4.62
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC1)	155.685	33.078	4.71
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC1)	635.253	137.482	4.62
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC3)	474.494	100.166	4.74
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC3)	1907.340	412.841	4.62
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC4)	635.538	134.544	4.72
Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC4)	2552.666	556.397	4.59
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC1)	634.736	136.355	4.66
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC1)	2548.283	561.827	4.54
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC3)	1911.454	421.571	4.53
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC3)	7663.803	1677.289	4.57
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC4)	2543.983	562.780	4.52
Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC4)	10211.693	2237.393	4.56
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC1)	22.341	4.811	4.64
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC1)	89.975	19.288	4.66
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC3)	67.237	14.643	4.59
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC3)	276.324	58.609	4.71
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC4)	89.587	19.554	4.58
Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC4)	370.986	77.136	4.81
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC1)	67.227	14.541	4.62
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC1)	276.357	58.076	4.76
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC3)	206.752	43.376	4.77
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC3)	841.638	177.787	4.73
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC4)	276.773	57.784	4.79
Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC4)	1127.740	237.472	4.75
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC1)	153.808	32.531	4.73
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC1)	627.765	129.990	4.83
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC3)	469.799	98.249	4.78
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC3)	1893.591	403.694	4.69
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC4)	627.724	129.962	4.83
Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC4)	2529.967	540.744	4.68
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC1)	628.089	130.277	4.82
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC1)	2521.817	540.146	4.67
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC3)	1905.004	404.704	4.71
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC3)	7567.971	1627.898	4.65
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC4)	2531.476	540.181	4.69
Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC4)	10075.594	2181.654	4.62
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC1)	22.566	5.076	4.45
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC1)	90.391	19.928	4.54
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC3)	67.758	14.740	4.60
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC3)	279.253	59.844	4.67
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC4)	90.296	19.802	4.56
Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC4)	373.972	79.815	4.69
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC1)	67.815	14.865	4.56
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC1)	279.398	60.054	4.65
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC3)	208.643	45.043	4.63
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC3)	850.042	180.985	4.70
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC4)	279.363	60.385	4.63
Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC4)	1134.858	243.062	4.67
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC1)	155.212	33.155	4.68
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC1)	634.985	134.911	4.71
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC3)	474.648	100.407	4.73
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC3)	1912.049	414.184	4.62
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC4)	635.252	132.587	4.79
Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC4)	2544.471	560.737	4.54
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC1)	634.574	134.966	4.70
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC1)	2545.129	561.498	4.53
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC3)	1910.900	419.365	4.56
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC3)	7662.603	1685.812	4.55
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC4)	2548.971	560.787	4.55
Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC4)	10201.407	2237.552	4.56
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC1)	22.718	4.961	4.58
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC1)	91.496	19.831	4.61
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC3)	67.910	15.151	4.48
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC3)	279.612	59.792	4.68
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC4)	91.073	19.853	4.59
Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC4)	374.641	79.155	4.73
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC1)	67.704	15.008	4.51
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC1)	279.229	60.088	4.65
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC3)	208.156	44.426	4.69
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC3)	849.501	180.848	4.70
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC4)	279.642	59.728	4.68
Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC4)	1129.826	242.880	4.65
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC1)	155.585	33.354	4.66
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC1)	634.090	134.995	4.70
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC3)	474.931	99.598	4.77
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC3)	1910.519	413.138	4.62
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC4)	635.026	135.155	4.70
Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC4)	2560.167	560.838	4.56
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC1)	634.893	134.883	4.71
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC1)	2548.166	560.831	4.54
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC3)	1911.392	419.816	4.55
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC3)	7646.634	1677.988	4.56
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC4)	2560.637	560.805	4.57
Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC4)	10227.044	2249.458	4.55

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

[x] I agree to contribute to the project under Apache 2 License.
[x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
[x] The PR is proposed to the proper branch
[x] There is a reference to the original bug report and related work
[ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.
[ ] The feature is well documented and sample code can be built with the project CMake

asmorkalov commented 2 months ago

@Junyan721113 Could you add more details on hardware configuration and how to reproduce the result?

Junyan721113 commented 2 months ago

@Junyan721113 Could you add more details on hardware configuration and how to reproduce the result?

No problem, here are the details for accuracy and performance tests.

RISC-V P Extension v0.5.2

Env

export RISCV=/opt/andes
export PATH=$PATH:/opt/andes/bin

Toolchain

Prebuilt Releases: Andes-Development-Kit

Suggested Version: v5_1_1

nds-gnu-toolchain

./build_linux_toolchain.sh

TARGET=riscv64-linux
PREFIX=/opt/andes
ARCH=rv64imafdcxandes
ABI=lp64d
CPU=andes-25-series
XLEN=64
BUILD=`pwd`/build-nds64le-linux-glibc-v5d

Qemu

qemu

shell ./build

../configure --prefix=/opt/andes --target-list=riscv32-linux-user,riscv64-linux-user --disable-werror --static

Board

The development board used for performance tests is TinkerV with Andes AX45.

Upload the installed toolchain's sysroot at /opt/andes/sysroot, or the prebuilt releases above.

/etc/ld.so.conf

include /etc/ld.so.conf.d/*.conf
/path/to/the/sysroot/library

shell

ldconfig -v

After that the sysroot library should appear in the result.

OpenCV Test

shell ./build

cmake -D CMAKE_BUILD_TYPE=Debug -D CMAKE_INSTALL_PREFIX=/opt/andes -D BUILD_SHARED_LIBS=OFF -D CMAKE_TOOLCHAIN_FILE=../platforms/linux/riscv64-andes-gcc.toolchain.cmake ..

Qemu

shell ./build/bin

qemu-riscv64 -cpu andes-ax25 -L /opt/andes/sysroot opencv_test_core

Board

Directly upload and run the test, and it would perform properly.

Junyan721113 commented 1 month ago

Considering the Todo List of this PR might be too long, would it be better to divide this PR into smaller ones?

mshabunin commented 1 month ago

@Junyan721113 , you can finalize current state more or less (HAL integration, several core functions implementation). And extend supported functions list in future PRs.

Junyan721113 commented 3 weeks ago

Considering the relation between HAL functions, this PR might be ready for review now. The optimizations mainly contains the following functions:

Core
[x] Element-wise add and subtract
[x] Element-wise minimum or maximum
[x] Element-wise absolute difference
[x] Bitwise logical operations
[x] Element-wise compare
ImgProc
[x] Integral
[x] Threshold
[x] WarpAffine
[x] WarpPerspective
Features2D

The rest of HAL functions are related to convolution, thus left for another PR.

Junyan721113 commented 3 weeks ago

Besides, I've noticed that some optimizations could be better if several functions required is also opened as HAL interface, such as:

AutoBuffer required by resize and Pyrdown These functions are not necessary to be HAL opened for optimizations, since they could be separately implemented. However, due to the weakness of RISC-V P extension, they may not be optimized by RVP and could be reused.
remap() required by warpAffine and warpProspective Although it can be reused in warpAffine and warpProspective, the remap functions without Floating-Point Operations can be optimized by RVP. However, if decided to optimize remap(), its implementation (such as static RemapNNFunc nn_tab[2][8]) would have so much coupling that every function must to be reimplemented by RVP. Maybe it is possible to open different types of remap functions as different HAL interfaces, called cv_hal_remapNN8u cv_hal_remapNN16s for example? Currently there is only one related inferface called cv_hal_remap32f.

Meanwhile, I wonder how will the HAL inferface change in the coming OpenCV 5.0. The changes may affect the next PR related to this 3rdparty library.

asmorkalov commented 3 weeks ago

@Junyan721113 Thanks a lot for the contribution!

AutoBuffer may be achieved by simple combination of new and malloca. Not sure, if we need expose it.
Remap was added to HAL interface a week ago: https://github.com/opencv/opencv/pull/25399. You are welcome to contribute RISC-V implementation.

Junyan721113 commented 3 weeks ago

@Junyan721113 Thanks a lot for the contribution!

AutoBuffer may be achieved by simple combination of new and malloca. Not sure, if we need expose it.

Remap was added to HAL interface a week ago: New HAL API for remap #25399. You are welcome to contribute RISC-V implementation.

Thank you! This helps me a lot.

Junyan721113 commented 3 weeks ago

@Junyan721113 Thanks a lot for the contribution!

AutoBuffer may be achieved by simple combination of new and malloca. Not sure, if we need expose it.

Remap was added to HAL interface a week ago: New HAL API for remap #25399. You are welcome to contribute RISC-V implementation.

The mentioned PR contains cv_hal_remap32f, how about adding cv_hal_remap8u cv_hal_remap8s cv_hal_remap16u cv_hal_remap16s? Float32 interface might not be helpful to RVP.

@Junyan721113 , you can finalize current state more or less (HAL integration, several core functions implementation). And extend supported functions list in future PRs.

Meanwhile, the to-do list of "Part 1" is finished, other new features will be in "Part 2". This PR is ready for review now.

asmorkalov commented 3 weeks ago

32f stands to mapx and mapy are floats, but bot fixed point. source and destination may be any OpenCV supported type. Sorry for the confusion.

mshabunin commented 2 weeks ago

Currently there are several warnings regarding strict aliasing in the new HAL library (warpAffine and warpPerspective). Are they serious issues or not? Can we somehow avoid these constructions (maybe with some reinterpret intrinsics)?

/work/opencv/3rdparty/ndsrvp/src/warpAffine.cpp: In member function 'virtual void cv::NdsrvpWarpAffineInvoker::operator()(const cv::Range&) const':
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                                               ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                                               ^~~~~~~~~~~~~~~~

Junyan721113 commented 1 week ago

Currently there are several warnings regarding strict aliasing in the new HAL library (warpAffine and warpPerspective). Are they serious issues or not? Can we somehow avoid these constructions (maybe with some reinterpret intrinsics)?

/work/opencv/3rdparty/ndsrvp/src/warpAffine.cpp: In member function 'virtual void cv::NdsrvpWarpAffineInvoker::operator()(const cv::Range&) const':
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   58 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
      |                                                                                               ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                            ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   82 |                             *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
      |                                                                                               ^~~~~~~~~~~~~~~~

It was a mistake. They've been replaced with safer explicit type conversions.

Junyan721113 commented 1 day ago

Strict-aliasing warnings have been fixed. Are there any other suggested changes?

opencv / opencv

3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions #25167

Summary

Previous context

Progress

Part 1 (This PR)

Part 2 (Next PR)

Performance Tests

Pull Request Readiness Checklist

RISC-V P Extension v0.5.2

Env

Toolchain

Qemu

Board

OpenCV Test

Qemu

Board