Open Junyan721113 opened 2 months ago
@Junyan721113 Could you add more details on hardware configuration and how to reproduce the result?
@Junyan721113 Could you add more details on hardware configuration and how to reproduce the result?
No problem, here are the details for accuracy and performance tests.
export RISCV=/opt/andes
export PATH=$PATH:/opt/andes/bin
Prebuilt Releases: Andes-Development-Kit
Suggested Version: v5_1_1
./build_linux_toolchain.sh
TARGET=riscv64-linux
PREFIX=/opt/andes
ARCH=rv64imafdcxandes
ABI=lp64d
CPU=andes-25-series
XLEN=64
BUILD=`pwd`/build-nds64le-linux-glibc-v5d
shell ./build
../configure --prefix=/opt/andes --target-list=riscv32-linux-user,riscv64-linux-user --disable-werror --static
The development board used for performance tests is TinkerV with Andes AX45.
Upload the installed toolchain's sysroot at /opt/andes/sysroot
, or the prebuilt releases above.
/etc/ld.so.conf
include /etc/ld.so.conf.d/*.conf
/path/to/the/sysroot/library
shell
ldconfig -v
After that the sysroot library should appear in the result.
shell ./build
cmake -D CMAKE_BUILD_TYPE=Debug -D CMAKE_INSTALL_PREFIX=/opt/andes -D BUILD_SHARED_LIBS=OFF -D CMAKE_TOOLCHAIN_FILE=../platforms/linux/riscv64-andes-gcc.toolchain.cmake ..
shell ./build/bin
qemu-riscv64 -cpu andes-ax25 -L /opt/andes/sysroot opencv_test_core
Directly upload and run the test, and it would perform properly.
Considering the Todo List of this PR might be too long, would it be better to divide this PR into smaller ones?
@Junyan721113 , you can finalize current state more or less (HAL integration, several core functions implementation). And extend supported functions list in future PRs.
Considering the relation between HAL functions, this PR might be ready for review now. The optimizations mainly contains the following functions:
The rest of HAL functions are related to convolution, thus left for another PR.
Besides, I've noticed that some optimizations could be better if several functions required is also opened as HAL interface, such as:
AutoBuffer
required by resize
and Pyrdown
These functions are not necessary to be HAL opened for optimizations, since they could be separately implemented. However, due to the weakness of RISC-V P extension, they may not be optimized by RVP and could be reused.
remap()
required by warpAffine
and warpProspective
Although it can be reused in warpAffine
and warpProspective
, the remap functions without Floating-Point Operations can be optimized by RVP. However, if decided to optimize remap()
, its implementation (such as static RemapNNFunc nn_tab[2][8]
) would have so much coupling that every function must to be reimplemented by RVP. Maybe it is possible to open different types of remap functions as different HAL interfaces, called cv_hal_remapNN8u
cv_hal_remapNN16s
for example? Currently there is only one related inferface called cv_hal_remap32f
.
Meanwhile, I wonder how will the HAL inferface change in the coming OpenCV 5.0. The changes may affect the next PR related to this 3rdparty library.
@Junyan721113 Thanks a lot for the contribution!
@Junyan721113 Thanks a lot for the contribution!
- AutoBuffer may be achieved by simple combination of new and malloca. Not sure, if we need expose it.
- Remap was added to HAL interface a week ago: New HAL API for remap #25399. You are welcome to contribute RISC-V implementation.
Thank you! This helps me a lot.
@Junyan721113 Thanks a lot for the contribution!
- AutoBuffer may be achieved by simple combination of new and malloca. Not sure, if we need expose it.
- Remap was added to HAL interface a week ago: New HAL API for remap #25399. You are welcome to contribute RISC-V implementation.
The mentioned PR contains cv_hal_remap32f
, how about adding cv_hal_remap8u
cv_hal_remap8s
cv_hal_remap16u
cv_hal_remap16s
? Float32
interface might not be helpful to RVP.
@Junyan721113 , you can finalize current state more or less (HAL integration, several core functions implementation). And extend supported functions list in future PRs.
Meanwhile, the to-do list of "Part 1" is finished, other new features will be in "Part 2". This PR is ready for review now.
32f stands to mapx and mapy are floats, but bot fixed point. source and destination may be any OpenCV supported type. Sorry for the confusion.
Currently there are several warnings regarding strict aliasing in the new HAL library (warpAffine and warpPerspective). Are they serious issues or not? Can we somehow avoid these constructions (maybe with some reinterpret intrinsics)?
/work/opencv/3rdparty/ndsrvp/src/warpAffine.cpp: In member function 'virtual void cv::NdsrvpWarpAffineInvoker::operator()(const cv::Range&) const':
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
58 | *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
| ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
58 | *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX);
| ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
82 | *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
| ^~~~~~~~~~~~~~~~
/opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
82 | *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx);
| ^~~~~~~~~~~~~~~~
Currently there are several warnings regarding strict aliasing in the new HAL library (warpAffine and warpPerspective). Are they serious issues or not? Can we somehow avoid these constructions (maybe with some reinterpret intrinsics)?
/work/opencv/3rdparty/ndsrvp/src/warpAffine.cpp: In member function 'virtual void cv::NdsrvpWarpAffineInvoker::operator()(const cv::Range&) const': /opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] 58 | *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX); | ^~~~~~~~~~~~~~~~ /opencv/3rdparty/ndsrvp/src/warpAffine.cpp:58:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] 58 | *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vY, *(uint16x4_t*)&vX); | ^~~~~~~~~~~~~~~~ /opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:76: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] 82 | *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx); | ^~~~~~~~~~~~~~~~ /opencv/3rdparty/ndsrvp/src/warpAffine.cpp:82:95: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] 82 | *(uint16x4_t*)(xy + x1 * 2) = __nds__v_pkbb16(*(uint16x4_t*)&vy, *(uint16x4_t*)&vx); | ^~~~~~~~~~~~~~~~
It was a mistake. They've been replaced with safer explicit type conversions.
Strict-aliasing warnings have been fixed. Are there any other suggested changes?
Summary
Previous context
From PR #24556:
Progress
Part 1 (This PR)
Part 2 (Next PR)
Rough Estimate. Todo List May Change.
Performance Tests
The optimization does not contain floating point opreations.
Absolute Difference
Geometric mean (ms)
Bitwise Operation
Geometric mean (ms)
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request